-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not reproducible across restarts #281
Comments
This indicates to me that the model state is not fully captured/restored by the restart files. This is a separate issue from #266, which is the occasional non-determinism of runs from the same restart. |
It should be stressed that this doesn't mean the model as it stands isn't reproducible, simply that stopping and restarting the model at different points in time will not give consistent results. An experiment can be reproduced as long as the same run lengths were used at all points in an experiment. We're in the process of assessing the performance impact of adding the |
Yes. Note that this is only the MOM5 model that is changed by the use of the The relevant extra compiler options used when https://github.com/ACCESS-NRI/MOM5/blob/master/bin/mkmf.template.nci#L44 My understanding is that the https://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf |
I hope this old issue is irrelevant #23 |
Hi @aekiss Its so long ago, I can't remember if it was acted on, it was certainly passed on. I am wondering if its something to do with forward and leap frog time steps on the ocean, and wether that was fully captured for in the i2o.nc type files. Its so long since I have looked at any of this. Both MOM and CICE run on their own should be OK. |
As far as I can tell it is reproducible across restarts with the |
The ACCESS-NRI release of ACCESS-OM2 will include a variant with reproducibility across restarts - see ACCESS-NRI/ACCESS-OM2#53 |
However, the restart-reproducible variant will be unable to reproduce historical runs - see https://forum.access-hive.org.au/t/access-om2-bit-repro-testing/1960 |
That release is now available https://github.com/ACCESS-NRI/ACCESS-OM2/releases/tag/2024.03.0 |
copying a Slack DM discussion here
@aidanheerdegen and Utkarsh discovered ACCESS-OM2 is not reproducible across restarts, i.e. 2x1-day runs is different from 1x2-day run. The non-reproducibility was detected via this test in this PR.
I've done some test runs in
~aek156/payu/om2-restart-repro
and confirmed this problem occurs even when comparing 2x2-timestep vs 1x4-timestep runs (the shortest possible - can't run for one timestep) - see~aek156/payu/om2-restart-repro
, e.g. see all the md5 differences inAidan found some old TWG notes suggesting we used to have reproducibility across restarts https://cosima.org.au/index.php/2018/12/13/technical-working-group-meeting-december-2018/
Nic's COSIMA repro tests use MOM built with a
--repro
flag https://github.com/COSIMA/access-om2/blob/master/test/exp_test_helper.py#L217-L218but it was apparently never turned on for production builds https://github.com/COSIMA/access-om2/blame/master/install.sh#L49
The text was updated successfully, but these errors were encountered: