Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix requested chunk cache size when appending #8

Merged
merged 1 commit into from
Oct 13, 2023

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Oct 13, 2023

The requested HDF5 chunk cache size was set to 16GB when appending to preexistent diagnostic netCDF files. This resulted in memory allocations > 5GB/PE in some cases, which could overallocate a Hera or Orion compute node and thus caused GSI regression tests to fail. This was not an issue with HDF5 v1.10.6 (not sure why) , but is with 1.14.0.

The default is 1MB. I assumed the intent was to set a size of 16MB. All GSI regression tests now pass their memthresh and maxmem tests.

Fixes #7

@DavidHuber-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA @aerorahul @hu5970 Could you review this PR, please?

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are straight forward. A few questions

  1. Does decreasing cache_nelems from 16777216 to 16384 impact the wall time of apps which use ncdiag? I'm thinking of gsi.x and ncdiag_cat_serial.x.
  2. While the change impacts appending files, will apps such as enkf.x see any impact on memory usage or wall time when reading netcdf diagnostic files created by the updated ncdiag?

Approve assuming proposed change has been tested and found to adequately address stated problem.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @DavidHuber-NOAA for addressing GSI related issues with newer libraries and compilers.

@DavidHuber-NOAA
Copy link
Contributor Author

DavidHuber-NOAA commented Oct 13, 2023

Changes are straight forward. A few questions

  1. Does decreasing cache_nelems from 16777216 to 16384 impact the wall time of apps which use ncdiag? I'm thinking of gsi.x and ncdiag_cat_serial.x.

There may be some difference, but it seems small. The entire appending process occurs twice for the global_4denvar test case. The total runtime increased from 366s to 375s, but this could just be noise. It is possible that this is more noticeable at higher resolutions. I will run a C384 case and see if there is any marked difference.

  1. While the change impacts appending files, will apps such as enkf.x see any impact on memory usage or wall time when reading netcdf diagnostic files created by the updated ncdiag?

No, this parameter does not affect the output diagnostic files. It must be specified at runtime to change the way a file is processed. Since the ncdiag read functions do not specify cache_nelems, this will not affect enkf.x.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @DavidHuber-NOAA for answering my questions.

@aerorahul aerorahul merged commit a78a71c into NOAA-EMC:develop Oct 13, 2023
@DavidHuber-NOAA
Copy link
Contributor Author

@aerorahul Could you please create a tag for this, version v1.1.2?

@DavidHuber-NOAA DavidHuber-NOAA deleted the fix/append_cache branch October 13, 2023 14:03
@aerorahul
Copy link
Contributor

@aerorahul Could you please create a tag for this, version v1.1.2?

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large memory footprint when appending to an existing netCDF file
3 participants