Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparse sum helper to util #2976

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

stress-tess
Copy link
Member

Add sparse sum helper to util. This strips out the extra work from the sum aggregation, but still uses radix sort to merge the 2 sorted indices list, which is definitely overkill. The next step is to optimize the merging

@stress-tess
Copy link
Member Author

Here is the code I used to test that this matches what we get when doing a groupby over the indices and sum aggregation

>>> def tester(n=5):
   ...:     import pandas as pd
   ...:     select_from = ak.arange(10**n)
   ...:     inds1 = select_from[ak.randint(0, 10, 10**n) % 3 == 0]
   ...:     inds2 = select_from[ak.randint(0, 10, 10**n) % 3 == 0]
   ...:     vals2 = ak.arange(10**n, 2*10**n)[inds2]
   ...:     print("started sumHelp way")
   ...:     timer1 = pd.Timestamp.now()
   ...:     helper_idx, helper_vals = ak.util.sparse_sum_help(inds1, inds2, inds1, vals2)
   ...:     timer2 = pd.Timestamp.now()
   ...:     print("finshed sumHelp way, took:")
   ...:     print(timer2 - timer1)
   ...:     vals = ak.concatenate((inds1, vals2))
   ...:     print("started gb way")
   ...:     timer1 = pd.Timestamp.now()
   ...:     gb_idx, gb_vals = ak.GroupBy(ak.concatenate([inds1, inds2])).sum(vals)
   ...:     timer2 = pd.Timestamp.now()
   ...:     print("finshed gb way, took:")
   ...:     print(timer2 - timer1)
   ...:     print()
   ...:     print((gb_idx == helper_idx).all())
   ...:     print((gb_vals == helper_vals).all())

slight change to avoid uniqueFromSorted call

throw in some consts for the heck of it (chpl compiler might be able to figure it out, but lets make its job a bit easier)
Copy link
Contributor

@jaketrookman jaketrookman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@stress-tess stress-tess added this pull request to the merge queue Feb 15, 2024
Merged via the queue into Bears-R-Us:master with commit d3f0885 Feb 15, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants