Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random stream per task structure and same seed reproducibility #3322

Closed
stress-tess opened this issue Jun 12, 2024 · 8 comments · Fixed by #3331
Closed

Random stream per task structure and same seed reproducibility #3322

stress-tess opened this issue Jun 12, 2024 · 8 comments · Fixed by #3331
Assignees
Labels
bug Something isn't working

Comments

@stress-tess
Copy link
Member

stress-tess commented Jun 12, 2024

@brandon-neth encountered some failures with test_poisson_hypothesis_testing. My first thought was that I must have missed fixing the seed, but I doubled checked and that wasn't the case.

Then I realized the way I structured the algorithm (where we have a random stream per task) could give different results on different machines despite them using the same seed. Using the same seed on a machine with the same number of locales and tasks per locale should give the same results. But different machines with a different number of total tasks wouldn't. In fact, running your code on even the same HPC but with a different number of nodes wouldn't give the same answer.

The same seed not giving the same results kinda takes away the usefulness of having a seed. I'm not sure yet how to handle this since random streams aren't thread safe, but I think I gotta revisit the implementation of poisson and exponential with the zig method

@Bears-R-Us/arkouda-core-dev, @jeremiah-corrado, @bmcdonald3, @ShreyasKhandekar let me know if y'all have any ideas on how to address this

EDIT:
I was able to verify this locally by running the following line of code with different number of locales

ak.random.default_rng(10).poisson(size = 10)

The output is:

# with 2 locales
array([2 0 3 2 1 1 2 0 1 0])

# with 3 locales
array([2 0 3 2 1 2 0 2 0 1])

# with 4 locales
array([2 0 3 1 2 2 0 1 2 0])
@stress-tess stress-tess self-assigned this Jun 12, 2024
@stress-tess stress-tess added the bug Something isn't working label Jun 12, 2024
@stress-tess
Copy link
Member Author

My first thought is to go back to having the same initial seed for all the task private random streams and fast-forward their state by how many elements came before them, but we don't know how many iterations the inner algorithm will take. So this gets us back to the potentially repeated values / non-independent problem

@stress-tess
Copy link
Member Author

I think making a random stream per element would work since this would only depend on the size of the return array which would be the same across different machines, but as we've discussed previously this would take up a significant amount of memory

@jeremiah-corrado
Copy link
Contributor

What about making a random stream per n elements, where n is a fixed constant (e.g., 4096)?

This way, you could have a consistent number of seeds for a given array size without having to generate a unique stream for each array element (which sounds like a lot of overhead).

@stress-tess
Copy link
Member Author

stress-tess commented Jun 13, 2024

My first thought is to go back to having the same initial seed for all the task private random streams and fast-forward their state by how many elements came before them, but we don't know how many iterations the inner algorithm will take. So this gets us back to the potentially repeated values / non-independent problem

@ajpotts and I had discussed this a bit a while back and if we have some sort of guarantee that the inner loop will complete after k iterations with a probability p then we could give each value k iterations worth of non-overlapping state, so we fast-forward each task's state by (here.id * nTasksPerLoc + tid) * k. This would allow us to make some sort of statement of the probability that the values are independent

EDIT:
wait nevermind this doesn't fix the problem at all lol, cause the state of the generators would still be dependent on number of tasks.... ignore me 😅

@stress-tess
Copy link
Member Author

stress-tess commented Jun 13, 2024

What about making a random stream per n elements, where n is a fixed constant (e.g., 4096)?

This way, you could have a consistent number of seeds for a given array size without having to generate a unique stream for each array element (which sounds like a lot of overhead).

I thought about this a bit. I didn't count it out, but I was afraid it could really limit the performance on large machines if it leaves lots of threads idle. And it seems much harder to take advantage of the way data is distributed because we can't factor number of locales into how many seeds we use. But something along these lines might just be the only way even if it's not ideal

Also there is the edge cases like a small number of elements to think about

EDIT:
once again I commented before really thinking things through hahah. Yah okay so we just loop over each locale's local subdomain and chunk it up by n.

The only thing is we'd need to keep up with the remainder from previous locales. So say local subdomain of locale 0 isn't divisible by n then the first generator on locale 1 needs to have the same seed and be fast-forwarded to pick up where locale 0 left off right? but we won't know how much to fast forward until locale 0 finishes

so if that's true, we might have to do the locales in order (using a blocking loop with atomics or something) but be able to do the computation on those locales in parallel, which does seem much better still

@stress-tess
Copy link
Member Author

also if it's a non-set seed we can still do the more optimal stream per task way, so theres a perf tradeoff for reproducibility. Okay i'm really warming up to this idea

@jeremiah-corrado
Copy link
Contributor

jeremiah-corrado commented Jun 13, 2024

The only thing is we'd need to keep up with the remainder from previous locales. So say local subdomain of locale 0 isn't divisible by n...

Instead, I think you could have the last generator on locale 0 handle the last remainder elements on locale 0 and the first n-remiander elements on locale 1. This would incur some communication cost for those boundary elements, but would avoid making the outer loop sequential as you mentioned.

@stress-tess
Copy link
Member Author

ooooo great idea!! Thanks for all the insights @jeremiah-corrado 🎉 :)

stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 14, 2024
This PR fixes Bears-R-Us#3322. The previous implementation of poisson created a random stream per task, but this lead to different results on different machines or with a different number of locales. To address this @jeremiah-corrado suggested a fixed number of elements per random stream. Since this depends only on the total number of elements to be generated (which won't change like number of locales or num tasks per locale), this should always give the same results given the same seed.

I added tests with pre-generated data from a 2 locale run with multiple orders of magnitude to test the cases where all the data is pulled down to locale 0 (total elements < `elementsPerRandomStream`) and where each locales is responsible for multiple chunks which are not evenly divisible by our chunk size.
stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 17, 2024
This PR fixes Bears-R-Us#3322. The previous implementation of poisson created a random stream per task, but this lead to different results on different machines or with a different number of locales. To address this @jeremiah-corrado suggested a fixed number of elements per random stream. Since this depends only on the total number of elements to be generated (which won't change like number of locales or num tasks per locale), this should always give the same results given the same seed.

I added tests with pre-generated data from a 2 locale run with multiple orders of magnitude to test the cases where all the data is pulled down to locale 0 (total elements < `elementsPerRandomStream`) and where each locales is responsible for multiple chunks which are not evenly divisible by our chunk size.
stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 19, 2024
This PR fixes Bears-R-Us#3322. The previous implementation of poisson created a random stream per task, but this lead to different results on different machines or with a different number of locales. To address this @jeremiah-corrado suggested a fixed number of elements per random stream. Since this depends only on the total number of elements to be generated (which won't change like number of locales or num tasks per locale), this should always give the same results given the same seed.

I added tests with pre-generated data from a 2 locale run with multiple orders of magnitude to test the cases where all the data is pulled down to locale 0 (total elements < `elementsPerRandomStream`) and where each locales is responsible for multiple chunks which are not evenly divisible by our chunk size.
stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 19, 2024
This PR fixes Bears-R-Us#3322. The previous implementation of poisson created a random stream per task, but this lead to different results on different machines or with a different number of locales. To address this @jeremiah-corrado suggested a fixed number of elements per random stream. Since this depends only on the total number of elements to be generated (which won't change like number of locales or num tasks per locale), this should always give the same results given the same seed.

I added tests with pre-generated data from a 2 locale run with multiple orders of magnitude to test the cases where all the data is pulled down to locale 0 (total elements < `elementsPerRandomStream`) and where each locales is responsible for multiple chunks which are not evenly divisible by our chunk size.
stress-tess added a commit to stress-tess/arkouda that referenced this issue Jun 21, 2024
This PR fixes Bears-R-Us#3322. The previous implementation of poisson created a random stream per task, but this lead to different results on different machines or with a different number of locales. To address this @jeremiah-corrado suggested a fixed number of elements per random stream. Since this depends only on the total number of elements to be generated (which won't change like number of locales or num tasks per locale), this should always give the same results given the same seed.

I added tests with pre-generated data from a 2 locale run with multiple orders of magnitude to test the cases where all the data is pulled down to locale 0 (total elements < `elementsPerRandomStream`) and where each locales is responsible for multiple chunks which are not evenly divisible by our chunk size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants