Audit use of Utils random generators and refactor if necessary. #6112

samuelklee · 2019-08-23T14:23:54Z

It appears that several tests do not appropriately reset the seeds of the Utils random generators, which leads to non-deterministic behavior when new tests are introduced or tests are run in a different order. Although this effectively increases test coverage, it may make things difficult to debug... I think it is probably safer to have private generators as needed.

@droazen or @lbergelson can you assign?

samuelklee · 2019-08-23T14:34:44Z

Actually, this happens in production code as well.

lbergelson · 2019-08-23T17:40:29Z

@samuelklee Can you point out the instances you found in production code?

samuelklee · 2019-08-23T18:45:37Z

Actually, I guess the global random generator is never reset in production code. This is probably fine and will result in overall deterministic behavior, but you can imagine instances in which changing the behavior or order of successive modules that use the generator could lead to debugging headaches.

In CNV code (notably, the MCMC/sampling utilities and KernelSegmenter), classes either instantiate or reset private generators when appropriate or their methods take a generator as a parameter. Eliminating the dependence on global state has definitely saved me some headaches during development, but this may be less true when developing methods that don't depend as heavily on randomization. Will leave it up to whoever tackles this issue to decide what is appropriate.

lbergelson · 2019-08-23T18:56:03Z

Yeah, it's currently a bit gross in that it should be deterministic, but it's not stable if you make any changes to the code. Resetting the random generator at every use though is obviously problematic, and deciding on something in between needs more thought than either extreme.

samuelklee · 2019-08-23T19:07:32Z

In any case, fixing the bad behavior in test code should be done soon. See #6107, which prompted this discussion, for an example.

jamesemery · 2019-08-26T14:39:44Z

To chime in, there has been discussion of changing our usage of the random generator before. In order to achieve parity between HaplotypeCaller and HaplotypeCallerSpark we need to be able to reset the random generator for downsampling by site. Currently not only is it not consistent with the non-spark version but it might be internally inconsistent in downsampling between active region determination and the actual calling code. There is a PR related to this #5448

samuelklee assigned droazen and lbergelson Aug 23, 2019

samuelklee changed the title ~~Audit use of Utils random generators in tests and refactor if necessary.~~ Audit use of Utils random generators and refactor if necessary. Aug 23, 2019

samuelklee mentioned this issue Mar 9, 2022

Added regularization to covariance in GMM maximization step to fix convergence issues in VQSR. #7709

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit use of Utils random generators and refactor if necessary. #6112

Audit use of Utils random generators and refactor if necessary. #6112

samuelklee commented Aug 23, 2019

samuelklee commented Aug 23, 2019

lbergelson commented Aug 23, 2019

samuelklee commented Aug 23, 2019 •

edited

Loading

lbergelson commented Aug 23, 2019

samuelklee commented Aug 23, 2019

jamesemery commented Aug 26, 2019 •

edited

Loading

Audit use of Utils random generators and refactor if necessary. #6112

Audit use of Utils random generators and refactor if necessary. #6112

Comments

samuelklee commented Aug 23, 2019

samuelklee commented Aug 23, 2019

lbergelson commented Aug 23, 2019

samuelklee commented Aug 23, 2019 • edited Loading

lbergelson commented Aug 23, 2019

samuelklee commented Aug 23, 2019

jamesemery commented Aug 26, 2019 • edited Loading

samuelklee commented Aug 23, 2019 •

edited

Loading

jamesemery commented Aug 26, 2019 •

edited

Loading