-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct netCDF data reads #324
Direct netCDF data reads #324
Conversation
Codecov Report
@@ Coverage Diff @@
## dev/gfdl #324 +/- ##
============================================
- Coverage 37.20% 37.16% -0.04%
============================================
Files 265 265
Lines 74432 74505 +73
Branches 13822 13837 +15
============================================
Hits 27691 27691
- Misses 41650 41720 +70
- Partials 5091 5094 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This patch introduces `read_netCDF_data`, a new method for reading netCDF datasets using the native netCDF I/O interface. It is designed to resemble the existing `MOM_read_data` function. Motivation ---------- Legacy input files may contain content which is not supported by the newest framework I/O (FMS). In order to retain support for these input files, particularly over a wider range of compilers, this patch provides an alternative method for reading these files. Interface --------- As with `MOM_read_data`, the function is provided with a netCDF filepath and a variable name, and returns the values to a provided variable. The `global_file` and `file_may_be_4d` flags have been dropped, since they are related to specific FMS2 compatibility issues. (Global vs domain-decomposed reads is controlled by the presence of a `MOM_domain`) Limited domain-decomposed I/O is supported, providing parallel I/O over a single file, to the extent supported by the filesystem. Parallelization over multiple files, as in FMS I/O, is not supported. Each FMS PE (MPI rank) reads its own segment, as defined by its MOM_domain. Output can be saved to either compute or data domains; as in FMS, the appropriate placement is inferred from the size of the output array. Support is currently limited to time-independent 2D arrays with center-cell indexing. That is, the `position` and `timelevel` arguments are not yet supported. The subroutines raise an error if these are provided, as an indication that they may support them in the future. Implementation -------------- Internally, the function opens a `MOM_netcdf_file`, generates its field/axis manifest, and reads the field contents. As with `MOM_read_data`, an internal rotation may be applied. The file is closed upon completion. (This behavior is designed to emulate the existing `MOM_read_data`; in a future implementation, we may want to use a persistent file handle which reduces the number of I/O operations.) Opening a `MOM_netcdf_file` now supports a `MOM_domain` argument, which is used to determine the index bounds of its local segment of the global domain. This is used to compute appropriate bounds for the native netCDF IO calls. As part of these changes, the `get_file_fields` function has been separated into itself and a new function, `update_file_contents`, which populates the internal axis and field metadata list. Usage ----- The following fields have been moved to the native netCDF IO: * `tideamp` (tidal mixing, FMS surface forcing) * `gustiness` (solo and FMS surface forcing) * `h2` (roughness in tidal mixing) This only comprises the fields which must be handled natively in order for the GFDL regression suite to pass with the PGI compiler; more files could be moved to native I/O in the future. Bugfixes -------- Some bugfixes to the netCDF I/O are also included: * `filename` attribute is now only written in an writeable state * Previously, `get_file_fields` (and now `update_file_contents`) assumed that every axis had an equivalent variable, which could lead to potential errors if an axis had no equvalent field, such as index bounds. We now count the number of variables with matching dimension names, tagged as axes, rather than assuming that every axis has a variable, and exclude them from the field list. * Not a bugfix, but `hor_index_init` was modified so that `param_file` is now an optional input. This function is used in `MOM_netcdf_file`, where `param_file` is not available. The argument is only used to call `log_param`. Previous usage of these functions was restricted to writing output with well-defined content, so were unaffected by these issues.
056a9e8
to
62c6a1a
Compare
This PR was updated following some helpful suggestions from @Hallberg-NOAA to provide comments explaining why we have introduced a second way to read files, and to signpost the locations where this alternative method is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes make sense to me, and they are well described both in the comments in the code and in the very detailed commit message. This still has to pass the pipeline regression testing (which is not working very well at the moment), but once this does pass, this PR should be merged into dev/gfdl.
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/18289. |
This patch introduces
read_netCDF_data
, a new method for reading netCDF datasets using the native netCDF I/O interface. It is designed to resemble the existingMOM_read_data
function.Motivation
Legacy input files may contain content which is not supported by the newest framework I/O (FMS). In order to retain support for these input files, particularly over a wider range of compilers, this patch provides an alternative method for reading these files.
Interface
As with
MOM_read_data
, the function is provided with a netCDF filepath and a variable name, and returns the values to a provided variable.The
global_file
andfile_may_be_4d
flags have been dropped, since they are related to specific FMS2 compatibility issues. (Global vs domain-decomposed reads is controlled by the presence of aMOM_domain
)Limited domain-decomposed I/O is supported, providing parallel I/O over a single file, to the extent supported by the filesystem. Parallelization over multiple files, as in FMS I/O, is not supported. Each FMS PE (MPI rank) reads its own segment, as defined by its MOM_domain.
Output can be saved to either compute or data domains; as in FMS, the appropriate placement is inferred from the size of the output array.
Support is currently limited to time-independent 2D arrays with center-cell indexing. That is, the
position
andtimelevel
arguments are not yet supported. The subroutines raise an error if these are provided, as an indication that they may support them in the future.Implementation
Internally, the function opens a
MOM_netcdf_file
, generates its field/axis manifest, and reads the field contents. As withMOM_read_data
, an internal rotation may be applied. The file is closed upon completion.(This behavior is designed to emulate the existing
MOM_read_data
; in a future implementation, we may want to use a persistent file handle which reduces the number of I/O operations.)Opening a
MOM_netcdf_file
now supports aMOM_domain
argument, which is used to determine the index bounds of its local segment of the global domain. This is used to compute appropriate bounds for the native netCDF IO calls.As part of these changes, the
get_file_fields
function has been separated into itself and a new function,update_file_contents
, which populates the internal axis and field metadata list.Usage
The following fields have been moved to the native netCDF IO:
tideamp
(tidal mixing, FMS surface forcing)gustiness
(solo and FMS surface forcing)h2
(roughness in tidal mixing)This only comprises the fields which must be handled natively in order for the GFDL regression suite to pass with the PGI compiler; more files could be moved to native I/O in the future.
Bugfixes
Some bugfixes to the netCDF I/O are also included:
filename
attribute is now only written in an writeable statePreviously,
get_file_fields
(and nowupdate_file_contents
) assumed that every axis had an equivalent variable, which could lead to potential errors if an axis had no equvalent field, such as index bounds.We now count the number of variables with matching dimension names, tagged as axes, rather than assuming that every axis has a variable, and exclude them from the field list.
Not a bugfix, but
hor_index_init
was modified so thatparam_file
is now an optional input. This function is used inMOM_netcdf_file
, whereparam_file
is not available. The argument is only used to calllog_param
.Previous usage of these functions was restricted to writing output with well-defined content, so were unaffected by these issues.