Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model does not reproduce with different blocksizes #1198

Closed
lisa-bengtsson opened this issue Apr 28, 2022 · 20 comments
Closed

Model does not reproduce with different blocksizes #1198

lisa-bengtsson opened this issue Apr 28, 2022 · 20 comments
Labels
bug Something isn't working

Comments

@lisa-bengtsson
Copy link
Contributor

lisa-bengtsson commented Apr 28, 2022

Description

The model does not give bitwise identical results if blocksize is changed. Tried blocksize 32, 16, and 5 they do not reproduce. Tried with the FV3_RAP suite and the FV3_GFS_v17_p8 suite both show this problem. Tried compiler flag change from decomposition issue, it did not solve the issue. Tried turning do_ca = false, did not solve the issue.

To Reproduce:

What compilers/machines are you seeing this with?
On Hera, intel compiler.

  1. Compiled and ran control_p8
  2. Copied control_p8 to control_p8_blocksize. Changed blocksize from 32 to 5. Ran, compared output with control_p8. Does not reproduce.

/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/SAVE_FOR_CA_BLOCKSIZE/control_p8_blocksize - blocksize 5
/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/SAVE_FOR_CA_BLOCKSIZE/control_p8 - blocksize 32

@pjpegion will provide additional testing below.

@junwang-noaa @bensonr @JessicaMeixner-NOAA @yangfanglin

@lisa-bengtsson lisa-bengtsson added the bug Something isn't working label Apr 28, 2022
@pjpegion
Copy link
Collaborator

In my test, I used the FV3_RAP suite.
I ran, outputting every time-step and saving the physics tendencies. The difference arises from differences in deep convection tendencies in the 2nd time-step. Surface fluxes, latent heat flux, dynamics tendencies are identical at this point.
I also tested with debug on, and it gave different results for different block sizes, although I did not save the tendencies in that run.

@lisa-bengtsson
Copy link
Contributor Author

lisa-bengtsson commented Apr 28, 2022

It is interesting since the FV3_RAP suite and the FV3_GFS_v17_p8 suite uses different convection schemes. @grantfirl could it be related to some generic convection routine in CCPP?

@bensonr
Copy link
Contributor

bensonr commented Apr 28, 2022

Last time this happened was early in the transition of GFSv15. At that time it was traced to a specific parameterization where a variable was being conditionally set. In that case, the variable was not given a default value, but set within a complex if-structure and then used outside of that if-structure. If any of the parameterizations are fortran-90 modules, it would also be good to understand how global variables are being set/used. This is just a few places to look.

@dustinswales
Copy link
Collaborator

@lisa-bengtsson
Did you try any other SDFs?
I ask because the commonality between these two is that they have Thompson MP enabled. I wonder if this issue is present in v16 physics?

@lisa-bengtsson
Copy link
Contributor Author

Thanks @bensonr, it could perhaps be a good idea to have a blocksize test in the ORT's in the future @DeniseWorthen?

@dustinswales I will try v16. Good suggestion.

@lisa-bengtsson
Copy link
Contributor Author

lisa-bengtsson commented Apr 28, 2022

Me and @pjpegion found that the control_p8 test does reproduce with blocksize of 32 and 16, but not if you chose blocksize 5.
blocksize 5 and 32 does however reproduce in conrol_debug_p8.

Both these tests are with do_ca = False because of a call to mpp_error when using non-uniform blocksizes when the do_ca namelist flag is true: #1193 so if anyone would like to test reproducibility with a non-uniform blocksize in control_p8, I recommend setting do_ca to false until issue 1193 is resolved.

@pjpegion
Copy link
Collaborator

I also checked control (GFS_v16) and it also nodes not reproduce when changing blocksize from 32 to 5.
But it also passed the block size test in debug mode.

Looking closely, the differences in both control and control_p8 occur in the 1st time step.
for control, I see a difference in the deep convection heating tendency, and in control_p8 I see a difference in the MP heating tendency, and also in the snow and water vapor mixing ratios at 1 gridpoint.

This reproducibility issue seems to be unrelated to the issue related to the GF convection scheme.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented May 5, 2022 via email

@DeniseWorthen
Copy link
Collaborator

@SMoorthi-emc Has there already been a fix for this issue? Or do we need a PR to fix it?

@lisa-bengtsson
Copy link
Contributor Author

I wonder if @DomHeinzeller has an idea about this? I'm not so familiar with the horizontal loop index for odd blocksizes within ccpp.

@climbfuji
Copy link
Collaborator

I sent this to @junwang-noaa a few days after Moorthi's comment:

CCPP works with non-divisible blocksizes. But: for CCPP the following is true, and I thought this was the same for IPD (because in my memory IPD also allocated the last block to the actual block length im, not imx - we didn’t change anything to the allocation of the GFS DDTs GFS_sfcprop, ...):

  • Results are the same across all runs that modify the blocksize as long as the blocksize is uniform, for example:
    - run the model with a blocksize of 24, then change it to 32 - as long as the blocksize is uniform (e.g. if the total number of gridpoints was imx = 96), results are the same
  • Results are the same from run to run if the blocksize is non-uniform, but the results differ from (all the) uniform runs. For example:
    - run the model with a blocksize of 7, let’s say the blocksize is non-uniform with the last block being of size 5 (imx = 96)
    - results will remain the same as long as the blocksize doesn’t change (i.e. restart runs, omp threading, …)
    - results will be different from the runs with uniform blocksizes (24, 32 in the above example)

This has to do with AVX2 I believe … in debug mode it should be reproducible between uniform and non-uniform blocksizes.

Look for logic around “non_uniform_blocks “ in atmos_model.F90 and CCPP_driver.F90.

Didn't you remove the AVX2 flags in the last several months? If so, then maybe it's all good now.

@lisa-bengtsson
Copy link
Contributor Author

@climbfuji I will redo the tests and see if we can close the issue

@lisa-bengtsson
Copy link
Contributor Author

@climbfuji @DeniseWorthen I tested "control" test (GFSv16) and now blocksize 32 and blocksize 5 are reproducible. This is perhaps enough to close the issue? Or do you want me to also test coupled prototype 8 to be sure?

@lisa-bengtsson
Copy link
Contributor Author

For good measure I tried also the cpld_control_p8 test which also now reproduces between blocksize 32 and blocksize 5. You can see the test directories here:

For GFSv16:
/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/TEST_BLOCKSIZE_GFSv16/control
/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/TEST_BLOCKSIZE_GFSv16/control_blocksize_5

For UFS coupled prototype 8:
/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/TEST_BLOCKSIZE_CPLD_P8/cpld_control_p8
/scratch2/BMC/rem/Lisa.Bengtsson/stmp2/Lisa.Bengtsson/FV3_RT/TEST_BLOCKSIZE_CPLD_P8/cpld_control_p8_blocksize5

@DeniseWorthen @junwang-noaa @JessicaMeixner-NOAA we can close this issue and remove it from the UFS coupled prototype Wednesday tag-up notes.

@DeniseWorthen
Copy link
Collaborator

@lisa-bengtsson Thanks for the extra effort of testing this in cpld_control_p8.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Oct 11, 2022 via email

@lisa-bengtsson
Copy link
Contributor Author

If your runs reproduces in debug mode it may be related to the "AVX2" compiler flag that Dom described above?

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Oct 11, 2022 via email

@lisa-bengtsson
Copy link
Contributor Author

Ok, yes, I noticed some strange updates in my inbox as well. Hopefully that gets solved quickly.

@pjpegion
Copy link
Collaborator

pjpegion commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants