-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Framework for statistical tests #3883
Comments
Both of these tests have introduced unmaintainable CMake workarounds that will have to be removed once we decide on a framework for statistical tests. |
Builds are now timing out on ICP workstations. On
On
This causes CI to fail in PRs (#3960) and nightly builds (#3951, #3963). I have removed |
The following tests are deterministic:
The criterion for "deterministic", here is There are quite a few tests using random numbers which nevertheless have to pass always. IMO, the non-deterministic parts of the affected tests should be moved into separate files. As suggested, they can then be excluded from slow CI runs. |
Langevin:
Brownian fully dpd:
dpd:
dpd: integrator_npt: fully lb:
lb_pressure_tensor_acf: fully lb_stokes_sphere: fully, but for speed, not because it is statistical virtual_sites_tracers: fully, but for speed, not because it is statistical mass-and-rinertia-per-particle: fully mass-and-rinertia-per-particle: rotational-diffusion-aniso: fully coulomb_tuning: fully wang_landau_reaction_ensemble: fully |
virtual_sites_tracers: |
Separate C++ coverage from python coverage. This way, statistical tests (#3883) can appear in the coverage report even though they are excluded from C++ coverage builds, and tutorial/sample tests can also appear in the coverage report.
Fixes #3883 Description of changes: - write more thorough RNG checks for thermostats - skip slow/statistical tests in coverage and sanitizer builds - fix broken ifdefs for Brownian Dynamics - cleanup IBM code
Currently we have frameworks for unit testing, integration testing and sample testing. They are all deterministic, either because the functionality being tested is deterministic, or RNG seeds (both in numpy and espresso) have been set such that the functionality is deterministic. We currently do not have a suitable framework for testing functionality that is stochastic in nature and for which a statistic (e.g. the mean) is known to converge for long simulation times, forcing us to use workarounds, e.g. disable tests in CMake for specific build types, increase test timeout in CI, or reduce the number of integration loops at the expense of accuracy.
Problem statement
Several python integration tests are actually not integration tests, but stochastic tests. They do not contribute to code coverage and do not benefit from C++ static assertions and sanitizers, since the underlying functionality already has integration tests. Yet they suffer from significant slowdowns (2-5 times larger runtime) in coverage and sanitizer builds. The Clang sanitizer CI job for example has crossed the 1 hour runtime threshold on multiple occasions over the last few months, prompting us to refactor slow tests to make them run faster at the expense of larger tolerance values in the checks, as well as refactor unrelated python tests to make them run faster and gain a few minutes to compensate for slow tests. Statistical tests also tend to undergo large runtime fluctuations on coyote runners that are under load from multiple CI jobs.
Although we should always prefer deterministic tests over statistical tests, there is a need for statistical tests that will not go away. Here are a few candidates for a statistical test framework (times in seconds;
RelWithAssert
builds are compiled with-O3 -g
):langevin_thermostat
brownian_dynamics
dpd
integrator_npt
lb
lb_pressure_tensor_acf
lb_stokes_sphere
virtual_sites_tracers
mass-and-rinertia_per_particle
rotational-diffusion-aniso
coulomb_tuning
p3m_tuning_exceptions
wang_landau_reaction_ensemble
Prior work
The sample and tutorial tests were stochastic from January 2019 to July 2020 using randomized numpy seeds and espresso seeds (in 4.1). These tests were too unstable and were eventually converted to deterministic test due to the high failure rate in CI. Generating reliable tolerance intervals was also problematic due to the variety of probability distributions underlying the statistics being tested, forcing us to run new tests in a loop to bootstrap the tolerance interval. This strategy doesn't work for us, so we cannot simply convert statistical tests to pseudo-sample tests to get improved runtimes from
Release
builds.Proposed solution
Re-purpose the unused (and no longer up-to-date)
check_python_skip_long
CMake target to skip the tests mentioned above in coverage and sanitizer builds. When a statistical test provides coverage, move the relevant part into a self-contained integration test. This should remove 11 min (= 22 min * cores) from the Clang job and provide a framework for statistical tests.Shortcomings
One drawback is that integration tests and statistical tests will still run together under the umbrella term of
python
tests. Moving statistical tests to their own CMake target would offer better separation and prevent statistical tests from running if the underlying features raised e.g. ASAN or UBSAN errors, at the expense of more CMake code and a bottleneck in CTest (right now CTest schedules GPU tests to run in serial and CPU tests to take the remaining available cores; introducing a synchronization barrier for statistical tests makes resources allocation less efficient).The text was updated successfully, but these errors were encountered: