-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit use of Utils random generators and refactor if necessary. #6112
Comments
Actually, this happens in production code as well. |
@samuelklee Can you point out the instances you found in production code? |
Actually, I guess the global random generator is never reset in production code. This is probably fine and will result in overall deterministic behavior, but you can imagine instances in which changing the behavior or order of successive modules that use the generator could lead to debugging headaches. In CNV code (notably, the MCMC/sampling utilities and KernelSegmenter), classes either instantiate or reset private generators when appropriate or their methods take a generator as a parameter. Eliminating the dependence on global state has definitely saved me some headaches during development, but this may be less true when developing methods that don't depend as heavily on randomization. Will leave it up to whoever tackles this issue to decide what is appropriate. |
Yeah, it's currently a bit gross in that it should be deterministic, but it's not stable if you make any changes to the code. Resetting the random generator at every use though is obviously problematic, and deciding on something in between needs more thought than either extreme. |
In any case, fixing the bad behavior in test code should be done soon. See #6107, which prompted this discussion, for an example. |
To chime in, there has been discussion of changing our usage of the random generator before. In order to achieve parity between HaplotypeCaller and HaplotypeCallerSpark we need to be able to reset the random generator for downsampling by site. Currently not only is it not consistent with the non-spark version but it might be internally inconsistent in downsampling between active region determination and the actual calling code. There is a PR related to this #5448 |
It appears that several tests do not appropriately reset the seeds of the Utils random generators, which leads to non-deterministic behavior when new tests are introduced or tests are run in a different order. Although this effectively increases test coverage, it may make things difficult to debug... I think it is probably safer to have private generators as needed.
@droazen or @lbergelson can you assign?
The text was updated successfully, but these errors were encountered: