-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cumulative ranges #96
Conversation
For better overview with respect to upcoming changes regarding #61, the range-construction-related functions are all clustered in the misc module now. Signed-off-by: Claus Hunsen <[email protected]>
With this patch, it is now possible to obtain also raw ranges, i.e., tuples of revisions in comparison to formatted range strings. Signed-off-by: Claus Hunsen <[email protected]>
When passing POSIXct object to 'construct.ranges' that denote midnight, the time was omitted by mistake. This is now corrected by calling 'get.date.string' now. Signed-off-by: Claus Hunsen <[email protected]>
As an request from issue #61, we add the function 'construct.overlapping.ranges' in this patch, with which reliably overlapping ranges can be constructed. the user needs to pass a start date, an end date, a time period for the ranges, and an overlap time period. The ranges can be requested in raw format as for 'construct.ranges' if needed. Signed-off-by: Claus Hunsen <[email protected]>
As an request from issue #61, we add the function 'construct.cumulative.ranges' in this patch, with which reliably cumulative ranges can be constructed. The user needs to pass a start date, an end date, and a time period for the ranges. Each range will then start at the start date, but end at different dates, in a cumulative manner. The ranges can be requested in raw format as for 'construct.ranges' if needed. Signed-off-by: Claus Hunsen <[email protected]>
As an request from issue #61, we add the function 'aggregate.ranges' in this patch, with which ranges can be aggregated on various levels reliably. The functionality is already internally used in 'split.data.by.networks'. Fix documentation for 'split.data.by.networks'. Signed-off-by: Claus Hunsen <[email protected]>
In the function 'split.data.by.networks', we can now use the newly added function 'aggregate.ranges' to implement the functionality. Signed-off-by: Claus Hunsen <[email protected]>
In this patch, we add the function 'construct.consecutive.ranges', with which consecutive ranges can be constructed (with no overlap). The user needs to pass a start date, an end date, and a time period for the ranges. The ranges can be requested in raw format as for 'construct.ranges' if needed. Signed-off-by: Claus Hunsen <[email protected]>
Using lubridate functionality, the function 'generate.date.sequence' generates a date sequence based on the given parameters. Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
In the range construction, the last range's ending bound is *exclusive*, thus, we need to add a second (1s) to the end date 'end'. The documentation is improved to this end and fixed regarding a copy- paste mistake. Signed-off-by: Claus Hunsen <[email protected]>
For interoperability and reliability, we now use the function 'generate.date.sequence' to generate the bin bounds that are needed to split any data in a time-based manner in 'split.get.bins.time.based'. As, with lubridate, a year is now exactly 1 year and 6 hours to mitigate leap years, the tests are adapted accordingly. Same thing holds for months, which are 30 days, 10 hours, and 30 minutes long, on overage. Signed-off-by: Claus Hunsen <[email protected]>
With the functions 'split.data.time.based.by.ranges' and 'split.network.time.based.by.ranges', it is now possible to split data and networks, respectively, by a list of given ranges. The ranges are applied This fixes #61. Signed-off-by: Claus Hunsen <[email protected]>
To increase code re-use and code maintainability, the function 'split.data.time.based.by.ranges' is now used to implement the functionality of 'split.data.by.networks'. Signed-off-by: Claus Hunsen <[email protected]>
Rearrange tests and introduce section headers. Signed-off-by: Claus Hunsen <[email protected]>
Update the test suite for the splitting functionality with tests for splitting by ranges. Signed-off-by: Claus Hunsen <[email protected]>
We just forgot to handle the end date properly when generating overlapping ranges. Signed-off-by: Claus Hunsen <[email protected]>
The overlap for overlapping ranges must not exceed the time period that each range should span. Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
We just forgot to handle the project end date properly when aggregating ranges with aggregation level "complete". In de752f8, the same fix is applied for the construction of overlapping ranges. Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
As the project end date may come from the timestamp data in a ProjectData instance, but the end date in any range is *exclusive*, we need to add one second (1s) when using the aggregation level "complete". The function was implemented correctly already, but the test was not adapted to it. Signed-off-by: Claus Hunsen <[email protected]>
When passing exclicit bins to the functions 'split.*.time.based', the bins are taken as is. I.e., the last bin and, thus, the last range bound is *exclusive*. Signed-off-by: Claus Hunsen <[email protected]>
Before now, the function 'generate.date.sequence' was not usable with date-time strings. Now, all given parameters are parsed as POSIXct objects first, before using them in the function. Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the PR looks good! Thanks for your effort in implementing all that stuff.
Nevertheless, there are some issues regarding documentation and also regarding inclusiveness resp. exclusiveness of last ranges' ends (see individual comments).
util-misc.R
Outdated
#' | ||
#' @param start The start time as string or POSIXct object | ||
#' @param end The start time as string or POSIXct object | ||
#' @param time.period The time period describing the length of time between dates, a character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter time.period
in the documentation does not exist, as the corresponding parameter is called by
here.
util-split.R
Outdated
@@ -410,6 +410,36 @@ split.data.by.networks = function(list.of.networks, project.data, | |||
return(net.to.range.list) | |||
} | |||
|
|||
#' Split the given data to the given ranges and return the resulting list. | |||
#' | |||
#' Note: You may want to use any function \code{generate.*.ranges} to obtain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions are called construct.*.ranges
and not generate.*.ranges
.
util-split.R
Outdated
@@ -675,6 +705,36 @@ split.network.activity.based = function(network, number.edges = 5000, number.win | |||
return(networks) | |||
} | |||
|
|||
#' Split the given network to the given ranges and return the resulting list. | |||
#' | |||
#' Note: You may want to use any function \code{generate.*.ranges} to obtain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
util-misc.R
Outdated
@@ -155,14 +155,17 @@ get.date.string = function(input) { | |||
## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / | |||
## Range construction and handling ----------------------------------------- | |||
|
|||
#' Construct the range strings. | |||
#' Construct ranges from the given list/vector of revisions. If \code{raw} is *not* | |||
#' \code{FALSE}, the ranges are construction in the format "rev[n]-rev[n+1]". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation here is hard to understand and -- as I understand it -- is wrong.
Instead of If \code{raw} is *not* \code{FALSE},
I would suggest If \code{raw} is \code{TRUE},
.
So, if raw
is TRUE
, what happens then? I guess we do not get the specified format then...
util-misc.R
Outdated
#' @param project.end the project end time as string or POSIXct object | ||
#' @param aggregation.level One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, | ||
#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and | ||
#' \code{"complete"}. S7ee above for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S7ee
;)
#' @param bins the date objects defining the start of ranges (the last date defines the end of the last range). | ||
#' If set, the 'time.period' parameter is ignored. | ||
#' @param bins the date objects defining the start of ranges (the last date defines the end of the last range, in an | ||
#' *exclusive* manner). If set, the 'time.period' parameter is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clhunsen I am not enthusiastic about the new behaviour here.
The only case in which I currently use this function is to use bins extracted from previous splittings and apply it to other data resp. network splittings. In the end, I expect that both results use exactly the same ranges. As I cannot think of a use case in which one would deliberately use the end of the last range in an exclusive manner, I would suggest the following solution:
When using bins in those functions, automatically modify the last one (i.e., add 1 second) to achieve the expected ranges.
(If one would really need the last range in an exclusive manner, we should state in the documentation how to achieve that.)
As we have inclusive ranges everywhere, I would like to have that also here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not enthusiastic about the new behaviour here.
This is not new behaviour, I just fixed the documentation on that (afc96f4). The last bin was always used to be the end of the last range. And as such, it is exclusive.
As we have inclusive ranges everywhere, I would like to have that also here.
As stated above, the ranges/intervals have always been constructed like this: [range start, range end)
, i.e., including range start
, but excluding range end
. Has never been different.
The only case in which I currently use this function is to use bins extracted from previous splittings and apply it to other data resp. network splittings. In the end, I expect that both results use exactly the same ranges.
So, the bins reported as an attribute on a splitting result do respect that. This means, when you split a data object in its entirety, the last range is always like this: [some date in the middle, last activity in the data + 1 second)
and, thus, the last bin reported in the attribute is last activity in the data + 1 second
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the clarification. I just had a look at the documentation changes here, which seemed to describe a new behaviour due to the changes in several commits before. As the behaviour has not changed, everything is fine here. Sorry for my unwarranted complaint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worry. 😉
util-misc.R
Outdated
## Range construction and handling ----------------------------------------- | ||
|
||
#' Construct ranges from the given list/vector of revisions. If \code{raw} is *not* | ||
#' \code{FALSE}, the ranges are construction in the format "rev[n]-rev[n+1]". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the location of this code snippet has changed during the commits of the PR, see the corresponding comment here:
#96 (comment)
This patch introduces only some minor fixes and inprovements to the documentation for the changes in PR #96. Props to @bockthom for being that precise. Signed-off-by: Claus Hunsen <[email protected]>
With the list of patches in this PR, we add the possibility to construct cumulative ranges as proposed by @bockthom and @ecklbarb in issue #61.
To construct ranges properly, we introduce a bunch of functions
construct.*.ranges
in themisc
module to construct ranges from a start date and an end date. This includes overlapping ranges and cumulative ranges. Additionally, theaggregate.ranges
function can be used to generate aggregated ranges from a list of existing ranges.For splitting, we add the function
split.*.time.based.by.ranges
, which is able to split data or networks by a list of ranges (rather than bins as insplit.*.time.based
).Existing functions are (partly) refactored to use the new functions. The test suite has been extended accordingly.
Important: The time periods for splitting change slightly, as we changed completely to
lubridate
for all range and bin constructions. In detail, "1 month" now means "30 days, 10 hours, and 30 minutes" (which is the average month) and "1 year" means "365 days and 6 hours" (as the average year).We made this change as it improves the bin and range construction more reliable, comprehensible, and more uniform. Although, this change make the ranges less readable, but that is an easy prize to pay.