Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit use of Utils random generators and refactor if necessary. #6112

Open
samuelklee opened this issue Aug 23, 2019 · 6 comments
Open

Audit use of Utils random generators and refactor if necessary. #6112

samuelklee opened this issue Aug 23, 2019 · 6 comments
Assignees

Comments

@samuelklee
Copy link
Contributor

It appears that several tests do not appropriately reset the seeds of the Utils random generators, which leads to non-deterministic behavior when new tests are introduced or tests are run in a different order. Although this effectively increases test coverage, it may make things difficult to debug... I think it is probably safer to have private generators as needed.

@droazen or @lbergelson can you assign?

@samuelklee
Copy link
Contributor Author

Actually, this happens in production code as well.

@samuelklee samuelklee changed the title Audit use of Utils random generators in tests and refactor if necessary. Audit use of Utils random generators and refactor if necessary. Aug 23, 2019
@lbergelson
Copy link
Member

@samuelklee Can you point out the instances you found in production code?

@samuelklee
Copy link
Contributor Author

samuelklee commented Aug 23, 2019

Actually, I guess the global random generator is never reset in production code. This is probably fine and will result in overall deterministic behavior, but you can imagine instances in which changing the behavior or order of successive modules that use the generator could lead to debugging headaches.

In CNV code (notably, the MCMC/sampling utilities and KernelSegmenter), classes either instantiate or reset private generators when appropriate or their methods take a generator as a parameter. Eliminating the dependence on global state has definitely saved me some headaches during development, but this may be less true when developing methods that don't depend as heavily on randomization. Will leave it up to whoever tackles this issue to decide what is appropriate.

@lbergelson
Copy link
Member

Yeah, it's currently a bit gross in that it should be deterministic, but it's not stable if you make any changes to the code. Resetting the random generator at every use though is obviously problematic, and deciding on something in between needs more thought than either extreme.

@samuelklee
Copy link
Contributor Author

In any case, fixing the bad behavior in test code should be done soon. See #6107, which prompted this discussion, for an example.

@jamesemery
Copy link
Collaborator

jamesemery commented Aug 26, 2019

To chime in, there has been discussion of changing our usage of the random generator before. In order to achieve parity between HaplotypeCaller and HaplotypeCallerSpark we need to be able to reset the random generator for downsampling by site. Currently not only is it not consistent with the non-spark version but it might be internally inconsistent in downsampling between active region determination and the actual calling code. There is a PR related to this #5448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants