Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use basic GroupsAccumulator with GroupsAccumulatorFlatAdapter for sum and average accumulation #174

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

srh
Copy link
Member

@srh srh commented Nov 4, 2024

For small groups, this makes accumulation much faster, as, instead of creating and updating a bunch of Box<dyn Accumulator>, we have a Vec<Accumulator>. Some of this code is taken from upstream datafusion's GroupsAccumulator and GroupsAccumulatorAdapter.

Some problems (presently) with this PR:

  • it only does Sum and Avg (I should definitely push a trivial change to make the rest use GroupsAccumulator)
  • it keeps some vestigial code around, retaining parallel arrays with the SpottyAccumulatorSet and the groups accumulator set. At run time we avoid iterating the group keys and looking up the hash table one more time when all_groups_accumulators is true. This is important for performance. We should just make every Accumulator use either their own supplied GroupsAccumulator, or GroupsAccumulatorAdapter, I think.
  • Possibly some of the comments from upstream are just inaccurate.

This also needs an update to arrow-rs: cube-js/arrow-rs#44

…ng API

This is based on upstream GroupsAccumulator and
GroupsAccumulatorAdapter, but extends the API so that existing hash
aggregation works with it.  We basically don't really use the upstream
interface (at this time).

We still use basic Accumulator for types that do not implement
GroupsAccumulator, and hash aggregation code handles this poorly.
@srh srh force-pushed the groups-accumulator branch from 18b3fae to f86c41c Compare November 7, 2024 17:22
@srh srh merged commit a8f045a into cube Nov 7, 2024
13 of 15 checks passed
@srh srh deleted the groups-accumulator branch November 7, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants