Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: disable concurrency in GFS_phys_time_vary_init NetCDF calls #735

Merged
merged 6 commits into from
Dec 19, 2023

Conversation

SamuelTrahanNOAA
Copy link
Contributor

Description

Occasionally, the model will fail in GFS_phys_time_vary.fv3.F90 during the first physics timestep. This is caused by heap corruption earlier, during gfs_phys_time_vary_init. After much debugging, I tracked down the problem.

The NetCDF library is not thread-safe:

The C-based libraries are not thread-safe. C-based libraries are those that depend on the C library, which currently include all language interfaces except for the Java interface

We're calling a non-thread-safe library in a threaded region. There are multiple NetCDF calls going concurrently. The fix is to read the NetCDF files one at a time.

Issue(s) addressed

Testing

How were these changes tested?

A failing C768 test case on Hera.
Also, running the regression tests on Hera.

What compilers / HPCs was it tested with?

Intel Hera.

GNU tests on Hera led to network errors at large node counts. This is an unrelated issue. Most likely, it is caused by choosing buggy libraries in Spack Stack.

Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)

Testing this requires running a gigantic test. A hundred fifty nodes at least. It isn't feasible to put that in the regression tests.

Have the ufs-weather-model regression test been run? On what platform?

NOAA Hera.

Will the code updates change regression test baseline? If yes, why? Please show the baseline directory below.

No.

Please commit the regression test log files in your ufs-weather-model branch

Yes.

Dependencies

@SamuelTrahanNOAA
Copy link
Contributor Author

@DusanJovic-NOAA - Could you please confirm that I merged your PR into this branch correctly?

@DusanJovic-NOAA
Copy link
Collaborator

Looks correct. Thanks.

@zach1221
Copy link
Collaborator

@jkbk2004 testing is complete on UFS-WM #2041, and we have approval here. Can you please merge?

@SamuelTrahanNOAA
Copy link
Contributor Author

@jkbk2004 testing is complete on UFS-WM #2041, and we have approval here. Can you please merge?

No. We must merge ufs-community/ccpp-physics#138 first.

@zach1221
Copy link
Collaborator

@jkbk2004 testing is complete on UFS-WM #2041, and we have approval here. Can you please merge?

No. We must merge ufs-community/ccpp-physics#138 first.

Oh, ok. I'll add it to the sub-component PR list in 2041.

@SamuelTrahanNOAA
Copy link
Contributor Author

@jkbk2004 testing is complete on UFS-WM #2041, and we have approval here. Can you please merge?

No. We must merge ufs-community/ccpp-physics#138 first.

Oh, ok. I'll add it to the sub-component PR list in 2041.

It's already on the list, along with two other subcomponent pull requests.

@SamuelTrahanNOAA
Copy link
Contributor Author

SamuelTrahanNOAA commented Dec 19, 2023

This PR is ready for final review and merge. I have reverted .gitmodules and pointed ccpp/physics to the head of ufs/dev.
@jkbk2004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run-time errors associated with HR3 UGWPv1 non-stationary GWD code
4 participants