-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When to do timestamp extraction for issue data? Before or after event filtering? #185
Comments
After talking about this issue with @hechtlC, we came up with a new solution, which circumvents all the disadvantages of the options stated above. Hence, this is our favorite option: Option 4: Adjust dates of the
|
I just want to share the thoughts that I had while revisiting this issue. In the end, I see a mixture of Options 1 and 4 as a resolution for this issue. And here is why and how:
Not sure whether I miss something here. Just my thoughts on this anyway. What are yours? |
Thanks for your thoughts @clhunsen! I completely agree with you.
That's true. We just stumbled over this inconsistency a few days ago in PR #194, as we need to be able to access both filtered and unfiltered issue data there. We agreed that issue filtering should be implemented similarly to commit filtering and @JoJoDeveloping is currently working on that for PR #194. Our goal is to adjust the issue filtering such that it behaves like commit filtering (and make sure that the configuration option |
I thought about that for a couple of minutes and still agree with you that we need a consistent behavior across all data sources. However, I am not sure if we really need to perform the timestamp extracting after filtering. The purpose of the timestamp extraction is to find out for which time periods we have data at all. Filtering just excludes certain events/files/etc. which are out of interest. But the data is, though, available for the complete, unfiltered period, even if there is no event of interest in this unfiltered period, there could potentially be one. Therefore, one could also argue that timestamp extraction before filtering is sufficient. Nevertheless, I would like to keep the behavior as it is currently implemented for commit timestamp extraction (to keep backwards compatibility) but adjust the issue timestamp extraction to be consistent. In the end, it should make (almost) no difference whether timestamps are extracted before or after filtering as filtering should have no effect on the timestamps any more (when we go with option 4 to adjust the weirdly shaped pull-request data.) |
…t.issues to make it similar to get.commits[.filtered] Close se-sic#185.
…t.issues to make it similar to get.commits[.filtered] Close se-sic#185. Signed-off-by: Johannes Hostert <[email protected]>
…t.issues to make it similar to get.commits[.filtered] Close se-sic#185. Signed-off-by: Johannes Hostert <[email protected]>
Description
During various debugging activities in the last days, I stumbled over the following issue:
When performing data cutting (
get.data.cut.to.same.date
) on issue data, the outcome sometimes looks weird. Then I found out why: The timestamps are extracted before the filtering on"issues.only.comments"
takes place -- see the function callextract.timestamps
in the following code excerpt:https://github.com/se-passau/coronet/blob/91fc448bad64dd17fa96a720322fe7d46337cc43/util-data.R#L900-L915
Hence, the data cutting function uses timestamps that may be outside the data-cutting range if they are related to an issue event which is not a comment. This might be confusing if a user is not aware of that. But then I thought about it: How can this happen? Can it happen that there are timestamps in the issue data which date back to before the first comment? -- Each issue / PR starts with a comment - so how can this happen? After doing some research, I found answer for these questions: If we have a pull request, all the commits added to this pull-request are an event themselves, the date of those events is equal to the author date of the commit. Actually adding this commit to the PR is a separate event having the timestamp of the time at which the commit was pushed. So, everything is ok, but we should think about how we want to handle this -- there are several options:
Option 1: Extract timestamps after the event-filtering on
"issues.only.comments"
has taken place.Advantage: We only get the timestamps we actually use (but then it would be necessary to also write the filtered data to the data object, reducing the needs for filtering again and again whenever calling
get.issues
).Disadvantage: The timestamps won't be updatable anymore. That is, as soon we have filtered the comment-events once, there is no way to return to the original data. However, we could change the implementation of the
"issues.only.comments"
configuration option in a way that it re-reads the data completely whenever changing this configuration option.Option 2: Filter events that may pre-date the actual issue data (like
"commit_added"
events which rather belong to commit data than to issue-data)Here we would also have two sub-options: Either remove them completely while reading the issue data, or just remove them from timestamp extraction.
Advantage: Comparably easy to implement, not many changes necessary.
Disadvantage: We may loose information on those events (depending on the sub-option) -- but this may be wanted in some kind.
Option 3: Don't change anything.
Advantage: Nothing to do, stop discussing it.
Disadvantage: Confusing behavior when there are very old commits referenced in a pull request and the pull request was initiated many months later than the commit was authored. This somehow circumvents the data cutting procedure as such old commits lead us to believe that the issue data started months ago, but actually it was just an old commit that was added to a pull request yesterday and the actual issue data started also yesterday.
What do you folks think about that? Which option would you prefer? I am also open for new suggestions.
The text was updated successfully, but these errors were encountered: