Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copy aggregation to uniformStreamPerElem #3364

Closed
stress-tess opened this issue Jun 26, 2024 · 0 comments · Fixed by #3310
Closed

Add copy aggregation to uniformStreamPerElem #3364

stress-tess opened this issue Jun 26, 2024 · 0 comments · Fixed by #3310
Assignees

Comments

@stress-tess
Copy link
Member

I think uniformStreamPerElem would benefit from copy aggregation. Especially in the case where a chunk a random stream is assigned to bleeds over to the next locale bc we solved this by having the current locale "steal" those indices. This at the very least would result in some PUTs to write those results to the correct locale. This may be the only communication we would expect, in which case we'd only need to create an aggregator for the last chunk on a locale, but I'd need to do some digging before I'm convinced of that.

I'd also definitely need feedback on this section from someone who understands aggregation a bit better than me since the only example I see that uses it with coforalls is radixSortLSD. But that creates an aggregator per task and we'd be creating one per set number of elements (same as when we create a random stream) so idk how this would change the worthwhileness of it

@stress-tess stress-tess self-assigned this Jun 26, 2024
stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 26, 2024
…on and aggregation to random generation loop

This PR (closes Bears-R-Us#3183) adds `exponential` and `standard_exponential` to random number generators implementing both the inverse cdf and ziggurat methods

Added copy aggregation to `uniformStreamPerElem`. When a chunk a random stream is assigned to overflows onto the next locale, we have the current locale "steal" those indices. This results in some PUTs to write those results to the correct locale after they are generated.

This seems to be the only communication based on my local testing. At first, I created a copy aggregator every time I created a randomStream (every `elemsPerStream` number of elements). I then compared these communication numbers to creating a copy aggregator for only the last chunk on a locale and they didn't change. So I left that implementation, this will hopefully reduce start up costs since we only create `numLocales` aggregators instead of `total_size / elemsPerStream` aggregators

I'd also definitely like feedback from someone who understands aggregation a bit better than me to sanity check that I'm not doing anything dumb since I normally only use them with `forall` loops. I based this off radixSortLSD since it also uses aggregations with coforalls. But that creates an aggregator per task so idk how this would change things
stress-tess added a commit that referenced this issue Jun 26, 2024
…o random generator loop (#3310)

* Closes #3183, #3364: Add `exponential` distribution and aggregation to random generation loop

This PR (closes #3183) adds `exponential` and `standard_exponential` to random number generators implementing both the inverse cdf and ziggurat methods

Added copy aggregation to `uniformStreamPerElem`. When a chunk a random stream is assigned to overflows onto the next locale, we have the current locale "steal" those indices. This results in some PUTs to write those results to the correct locale after they are generated.

This seems to be the only communication based on my local testing. At first, I created a copy aggregator every time I created a randomStream (every `elemsPerStream` number of elements). I then compared these communication numbers to creating a copy aggregator for only the last chunk on a locale and they didn't change. So I left that implementation, this will hopefully reduce start up costs since we only create `numLocales` aggregators instead of `total_size / elemsPerStream` aggregators

I'd also definitely like feedback from someone who understands aggregation a bit better than me to sanity check that I'm not doing anything dumb since I normally only use them with `forall` loops. I based this off radixSortLSD since it also uses aggregations with coforalls. But that creates an aggregator per task so idk how this would change things

* make a few consts params instead

---------

Co-authored-by: Tess Hayes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant