-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add copy aggregation to uniformStreamPerElem
#3364
Comments
stress-tess
added a commit
to stress-tess/arkouda
that referenced
this issue
Jun 26, 2024
…on and aggregation to random generation loop This PR (closes Bears-R-Us#3183) adds `exponential` and `standard_exponential` to random number generators implementing both the inverse cdf and ziggurat methods Added copy aggregation to `uniformStreamPerElem`. When a chunk a random stream is assigned to overflows onto the next locale, we have the current locale "steal" those indices. This results in some PUTs to write those results to the correct locale after they are generated. This seems to be the only communication based on my local testing. At first, I created a copy aggregator every time I created a randomStream (every `elemsPerStream` number of elements). I then compared these communication numbers to creating a copy aggregator for only the last chunk on a locale and they didn't change. So I left that implementation, this will hopefully reduce start up costs since we only create `numLocales` aggregators instead of `total_size / elemsPerStream` aggregators I'd also definitely like feedback from someone who understands aggregation a bit better than me to sanity check that I'm not doing anything dumb since I normally only use them with `forall` loops. I based this off radixSortLSD since it also uses aggregations with coforalls. But that creates an aggregator per task so idk how this would change things
stress-tess
added a commit
that referenced
this issue
Jun 26, 2024
…o random generator loop (#3310) * Closes #3183, #3364: Add `exponential` distribution and aggregation to random generation loop This PR (closes #3183) adds `exponential` and `standard_exponential` to random number generators implementing both the inverse cdf and ziggurat methods Added copy aggregation to `uniformStreamPerElem`. When a chunk a random stream is assigned to overflows onto the next locale, we have the current locale "steal" those indices. This results in some PUTs to write those results to the correct locale after they are generated. This seems to be the only communication based on my local testing. At first, I created a copy aggregator every time I created a randomStream (every `elemsPerStream` number of elements). I then compared these communication numbers to creating a copy aggregator for only the last chunk on a locale and they didn't change. So I left that implementation, this will hopefully reduce start up costs since we only create `numLocales` aggregators instead of `total_size / elemsPerStream` aggregators I'd also definitely like feedback from someone who understands aggregation a bit better than me to sanity check that I'm not doing anything dumb since I normally only use them with `forall` loops. I based this off radixSortLSD since it also uses aggregations with coforalls. But that creates an aggregator per task so idk how this would change things * make a few consts params instead --------- Co-authored-by: Tess Hayes <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think
uniformStreamPerElem
would benefit from copy aggregation. Especially in the case where a chunk a random stream is assigned to bleeds over to the next locale bc we solved this by having the current locale "steal" those indices. This at the very least would result in some PUTs to write those results to the correct locale. This may be the only communication we would expect, in which case we'd only need to create an aggregator for the last chunk on a locale, but I'd need to do some digging before I'm convinced of that.I'd also definitely need feedback on this section from someone who understands aggregation a bit better than me since the only example I see that uses it with coforalls is radixSortLSD. But that creates an aggregator per task and we'd be creating one per set number of elements (same as when we create a random stream) so idk how this would change the worthwhileness of it
The text was updated successfully, but these errors were encountered: