-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong bins attribute after sliding-window-based network splitting & inconsistent behavior of construct.ranges
#256
Comments
@MaLoefUDS This issue should have highest priority now. Please address this issue before working on the other issues that we have assigned to you. Thanks! |
Oh that is something that slipped through PR #249. From my investigations the problem originates here, where we reconstruct the bins attribute from ranges, to minimize the locations of code where we pass bins as an attribute down the splitting functions: ( Lines 817 to 822 in 1d3d1a3
The applied algorithm fails, since it does not account for the fact that the last sliding range does have a different ending time than the last regular range. The end time of the last sliding range is also not taken as the start of any range, since no sliding range follows the last sliding range obviously. Therefore, we lose the end date here. I have not found a fix for the algorithm yet, however, I will think about it. Thank you for noticing this issue of mine! |
I have just tried this simple "fix": ## convert ranges to bins
bins.starts = sapply(ranges.bounds, function(range) range[1])
bins.ends = sapply(ranges.bounds, function(range) range[2])
bins.date = get.date.from.unix.timestamp(unique(c(bins.starts, bins.ends))) It only breaks 1 test and manual debugging that test showed that it should probably break, i.e., the test follows the same scenario that you described in the MWE. I think, in conclusion, this problem really is showing a lack of clear definition of what the For now, I did not have a look at problem 2) you described, in the hope that it is directly correlated to 1). However, I will have a look probably tomorrow morning! |
Okay now I have also had a look at problem 2) and I can calm down many of your worries, I believe. Yes the problem in [1] "2018-10-05 16:58:32-2019-04-06 07:58:32" We also do not cause any further complications for the remainder of your MWE code. Everything just looks to be doing good! After this conclusion, I will prepare a commit including a fix for Cheers! |
Great! I've tried your fix in the script which breaks without this fix. Fortunately, using your fix, the script does not end up in an error any more. So, your fix looks good to me and seems to work in the tested scenario with real-world data.
True. I never did communicate clearly enough what the |
Great, thanks for looking at this problem!
Sounds good. Please add the commit hash of the fix and the issue id of this issue (#256) to the corresponding and already existing "Added" item in the NEWS.md. As this is something that is still not released, the commit still belongs to the "Added" category. Thanks! |
Occasionally, the 'bins' attribute did not include the end-date of the second-last range, as described in Issue se-sic#256. In addition, fix one faulty test.
Occasionally, the 'bins' attribute did not include the end-date of the second-last range, as described in Issue se-sic#256. In addition, fix one faulty test. Signed-off-by: Maximilian Löffler <[email protected]>
Occasionally, the 'bins' attribute did not include the end-date of the second-last range, as described in Issue se-sic#256. In addition, fix one faulty test. Signed-off-by: Maximilian Löffler <[email protected]>
Unfortunately, we have to postpone the planned release of coronet until we have fixed the following bug(s) related to sliding-window splitting, which I spotted during pre-release testing of analysis scripts with real empirical data:
Description
(1) The network attribute
"bins"
is incorrect aftersplitting.networks.time.based
usingsliding.window = TRUE
.(2)
construct.ranges
from given bins withsliding.window = TRUE
is not in line with the recent changes of sliding-window splitting. I am still not sure whether this is an actual bug or whether we need additional workarounds for the specific case in which I noticed the reported problem, or whether (1) and (2) are just mutually dependent.Steps to Reproduce or Minimal Working Example (MWE)
Consider the following sequence of statements:
(1) Split networks time based:
Assume that
global.net.list
is a list that contains exactly one author network that covers the whole time period.When having data from 2018-10-05 to 2020-02-24, the we will get 5 ranges. However, the
"bins"
attribute that is attached to the resulting list of networks (see variablesplit.networks.bins
in the above example) is incorrect:[1] "2018-10-05 16:58:32 UTC" "2019-01-05 00:28:32 UTC"
[3] "2019-04-06 07:58:32 UTC" "2019-07-06 15:28:32 UTC"
[5] "2019-10-05 22:58:32 UTC" "2020-02-24 09:02:51 UTC"
The fourth range starts at 2019-07-06 and ends at 2020-01-05. However, 2020-01-05 is not part of the returned bins.
The fifth ranges starts at 2019-10-05 and ends at 2020-02-24. While the networks are built correctly, the bins attribute misses the end of the second last range in this case (as the range from 2020-01-05 to 2020-02-24 is dropped since it is fully contained in 2019-10-05 to 2020-02-24.)
(2) The above behavior leads to another problem: Range names cannot be constructed anymore from the bins attribute, so
construct.ranges
produces weird results:Then,
ranges
is:[1] "2018-10-05 16:58:32-2019-04-06 07:58:32"
[2] "2019-01-05 00:28:32-2019-07-06 15:28:32"
[3] "2019-04-06 07:58:32-2019-10-05 22:58:32"
[4] "2019-07-06 15:28:32-2020-02-24 09:02:51"
And
names(split.networks[[1]])
is:[1] "2018-10-05 16:58:32-2019-04-06 07:58:32"
[2] "2019-01-05 00:28:32-2019-07-06 15:28:32"
[3] "2019-04-06 07:58:32-2019-10-05 22:58:32"
[4] "2019-07-06 15:28:32-2020-02-24 09:02:51"
[5] NA
What we can clearly see is: Although we have 5 range networks, we can construct only the names of 4 ranges.
On the one hand, this comes from the bug above: Due to 2020-01-05 being missing in the
bins
attribute, the fourth range is too long and, thus, no fifth range is computed.On the other hand, even if 2020-01-05 would be present, I assume that
construct.ranges
would return 6 ranges then (but I have not tested this assumption yet), because then it will also construct the range 2020-01-05-2020-02-24 which is fully contained in the previous range. Thus, range construction inconstruct.ranges
leads to different results than the ranges that result from network splitting. I think, both should somehow be consistent.For this second problem, I have two ideas:
construct.ranges
should result in different results than splitting, we could just fixconstruct.ranges
accordingly.construct.ranges
is intended? What should happen if I construct these ranges by hand and I want to keep the 6th range in the example that is fully contained in the data. In such a case, one would need to differentiate between "keeping the fully contained range" and "removing the fully contained range" withinconstruct.ranges
. If we want to support both options,construct.ranges
would need an additional boolean flag (e.g.,remove.fully.contained.sliding.range
or similar). If so, what would be the default value then? Should the default value support the splitting functionality? Or should the default value support user demands sinceconstruct.ranges
does not belong to the splitting module. To be honest, I don't know yet which of both arguments does sound more convincing to me...Any thoughts on this?
Versions
This bug affects the current unversioned
dev
branch, which is scheduled to become version 4.4 soon. Thus, this bug needs to be fixed before we can release version 4.4.Additional Information
Regarding bug (1): I just used
split.networks.time.based
, but potentially the incomplete bins attribute could also be produced via similar splitting functions. Thus, we need to check all network splitting and data splitting functions.Regarding bug (2): We should also check all variants of
construct.ranges
and related functions.The text was updated successfully, but these errors were encountered: