-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for inserting duplicates in groupby result cache #9508
Fix for inserting duplicates in groupby result cache #9508
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9508 +/- ##
================================================
- Coverage 10.79% 10.67% -0.12%
================================================
Files 116 117 +1
Lines 18869 19714 +845
================================================
+ Hits 2036 2104 +68
- Misses 16833 17610 +777
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to read the issue description to understand why this fix works and how it works. So please add some comments in the code to clarify the problem.
Agreed. The fix looks good but the test needs comments. I would add comments that specifically indicate which calls should exit early and do not modify the cache. Is the design of using a reference to an aggregation in the map’s key (which requires storing a copy of the aggregation in the value for lifetime reasons) motivated primarily by performance during lookup or convenience? I saw the comment about it being designed to allow lookup by reference but I was surprised this isn’t possible in some other way that is less awkward and avoids object lifetime gymnastics. I have to think about it some more but it seems like this bug might have been avoided in the first place with a different design. |
Added comments to test & code. reference wrapper design was old code. Not sure of why this design decision is taken. I just changed old code to add |
Co-authored-by: Bradley Dice <[email protected]>
Thanks @karthikeyann! This should be ready to merge. |
@gpucibot merge |
Fixes #9507
Prevents inserting to groupby result cache if the result for <column, aggregation> pair is already present in the cache.
Added unit test to test this.
Details:
When
add_result(col1, agg1, column1); add_result(col1, agg1, column2);
is called (see twice), then _cache doesn't contain any value for {col1, agg1} anymore.Issue is in
_cache
std::unordered_map
withstd::reference_wrapper<aggregation const>
in the key.When
_cache[{input, key}] = std::move(value);
executes 2nd time, old key is destroyed. But the key's reference never changes which points to the destroyed key.So, when compared again,
pair_column_aggregation_equal_to
fails because we are comparing a destroyed object (whose memory may have been overwritten).#9507 (comment)