Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct netCDF data reads #324

Merged
merged 1 commit into from
Feb 19, 2023

Conversation

marshallward
Copy link
Member

This patch introduces read_netCDF_data, a new method for reading netCDF datasets using the native netCDF I/O interface. It is designed to resemble the existing MOM_read_data function.

Motivation

Legacy input files may contain content which is not supported by the newest framework I/O (FMS). In order to retain support for these input files, particularly over a wider range of compilers, this patch provides an alternative method for reading these files.

Interface

As with MOM_read_data, the function is provided with a netCDF filepath and a variable name, and returns the values to a provided variable.

The global_file and file_may_be_4d flags have been dropped, since they are related to specific FMS2 compatibility issues. (Global vs domain-decomposed reads is controlled by the presence of a MOM_domain)

Limited domain-decomposed I/O is supported, providing parallel I/O over a single file, to the extent supported by the filesystem. Parallelization over multiple files, as in FMS I/O, is not supported. Each FMS PE (MPI rank) reads its own segment, as defined by its MOM_domain.

Output can be saved to either compute or data domains; as in FMS, the appropriate placement is inferred from the size of the output array.

Support is currently limited to time-independent 2D arrays with center-cell indexing. That is, the position and timelevel arguments are not yet supported. The subroutines raise an error if these are provided, as an indication that they may support them in the future.

Implementation

Internally, the function opens a MOM_netcdf_file, generates its field/axis manifest, and reads the field contents. As with MOM_read_data, an internal rotation may be applied. The file is closed upon completion.

(This behavior is designed to emulate the existing MOM_read_data; in a future implementation, we may want to use a persistent file handle which reduces the number of I/O operations.)

Opening a MOM_netcdf_file now supports a MOM_domain argument, which is used to determine the index bounds of its local segment of the global domain. This is used to compute appropriate bounds for the native netCDF IO calls.

As part of these changes, the get_file_fields function has been separated into itself and a new function, update_file_contents, which populates the internal axis and field metadata list.

Usage

The following fields have been moved to the native netCDF IO:

  • tideamp (tidal mixing, FMS surface forcing)
  • gustiness (solo and FMS surface forcing)
  • h2 (roughness in tidal mixing)

This only comprises the fields which must be handled natively in order for the GFDL regression suite to pass with the PGI compiler; more files could be moved to native I/O in the future.

Bugfixes

Some bugfixes to the netCDF I/O are also included:

  • filename attribute is now only written in an writeable state

  • Previously, get_file_fields (and now update_file_contents) assumed that every axis had an equivalent variable, which could lead to potential errors if an axis had no equvalent field, such as index bounds.

    We now count the number of variables with matching dimension names, tagged as axes, rather than assuming that every axis has a variable, and exclude them from the field list.

  • Not a bugfix, but hor_index_init was modified so that param_file is now an optional input. This function is used in MOM_netcdf_file, where param_file is not available. The argument is only used to call log_param.

Previous usage of these functions was restricted to writing output with well-defined content, so were unaffected by these issues.

@codecov
Copy link

codecov bot commented Feb 10, 2023

Codecov Report

Merging #324 (3545f19) into dev/gfdl (2fe2631) will decrease coverage by 0.04%.
The diff coverage is 2.06%.

❗ Current head 3545f19 differs from pull request most recent head 62c6a1a. Consider uploading reports for the commit 62c6a1a to get more accurate results

@@             Coverage Diff              @@
##           dev/gfdl     #324      +/-   ##
============================================
- Coverage     37.20%   37.16%   -0.04%     
============================================
  Files           265      265              
  Lines         74432    74505      +73     
  Branches      13822    13837      +15     
============================================
  Hits          27691    27691              
- Misses        41650    41720      +70     
- Partials       5091     5094       +3     
Impacted Files Coverage Δ
...ig_src/drivers/solo_driver/MOM_surface_forcing.F90 24.52% <0.00%> (ø)
src/framework/MOM_io.F90 30.62% <0.00%> (-0.52%) ⬇️
src/framework/MOM_io_file.F90 50.29% <0.00%> (-7.24%) ⬇️
...rc/parameterizations/vertical/MOM_tidal_mixing.F90 1.40% <0.00%> (ø)
src/framework/MOM_netcdf.F90 44.79% <4.54%> (-2.35%) ⬇️
src/framework/MOM_hor_index.F90 23.43% <50.00%> (-0.38%) ⬇️
src/framework/MOM_document.F90 73.36% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

This patch introduces `read_netCDF_data`, a new method for reading
netCDF datasets using the native netCDF I/O interface.  It is designed
to resemble the existing `MOM_read_data` function.

Motivation
----------

Legacy input files may contain content which is not supported by the
newest framework I/O (FMS).  In order to retain support for these input
files, particularly over a wider range of compilers, this patch provides
an alternative method for reading these files.

Interface
---------

As with `MOM_read_data`, the function is provided with a netCDF filepath
and a variable name, and returns the values to a provided variable.

The `global_file` and `file_may_be_4d` flags have been dropped, since
they are related to specific FMS2 compatibility issues.  (Global vs
domain-decomposed reads is controlled by the presence of a `MOM_domain`)

Limited domain-decomposed I/O is supported, providing parallel I/O over
a single file, to the extent supported by the filesystem.
Parallelization over multiple files, as in FMS I/O, is not supported.
Each FMS PE (MPI rank) reads its own segment, as defined by its
MOM_domain.

Output can be saved to either compute or data domains; as in FMS, the
appropriate placement is inferred from the size of the output array.

Support is currently limited to time-independent 2D arrays with
center-cell indexing.  That is, the `position` and `timelevel` arguments
are not yet supported.  The subroutines raise an error if these are
provided, as an indication that they may support them in the future.

Implementation
--------------

Internally, the function opens a `MOM_netcdf_file`, generates its
field/axis manifest, and reads the field contents.  As with
`MOM_read_data`, an internal rotation may be applied.  The file is
closed upon completion.

(This behavior is designed to emulate the existing `MOM_read_data`; in a
future implementation, we may want to use a persistent file handle which
reduces the number of I/O operations.)

Opening a `MOM_netcdf_file` now supports a `MOM_domain` argument, which
is used to determine the index bounds of its local segment of the global
domain.  This is used to compute appropriate bounds for the native
netCDF IO calls.

As part of these changes, the `get_file_fields` function has been
separated into itself and a new function, `update_file_contents`, which
populates the internal axis and field metadata list.

Usage
-----

The following fields have been moved to the native netCDF IO:

* `tideamp` (tidal mixing, FMS surface forcing)
* `gustiness` (solo and FMS surface forcing)
* `h2` (roughness in tidal mixing)

This only comprises the fields which must be handled natively in order
for the GFDL regression suite to pass with the PGI compiler; more files
could be moved to native I/O in the future.

Bugfixes
--------

Some bugfixes to the netCDF I/O are also included:

* `filename` attribute is now only written in an writeable state

* Previously, `get_file_fields` (and now `update_file_contents`) assumed
  that every axis had an equivalent variable, which could lead to
  potential errors if an axis had no equvalent field, such as index
  bounds.

  We now count the number of variables with matching dimension names,
  tagged as axes, rather than assuming that every axis has a variable,
  and exclude them from the field list.

* Not a bugfix, but `hor_index_init` was modified so that `param_file`
  is now an optional input.  This function is used in `MOM_netcdf_file`,
  where `param_file` is not available.  The argument is only used to
  call `log_param`.

Previous usage of these functions was restricted to writing output with
well-defined content, so were unaffected by these issues.
@marshallward
Copy link
Member Author

This PR was updated following some helpful suggestions from @Hallberg-NOAA to provide comments explaining why we have introduced a second way to read files, and to signpost the locations where this alternative method is used.

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me, and they are well described both in the comments in the code and in the very detailed commit message. This still has to pass the pipeline regression testing (which is not working very well at the moment), but once this does pass, this PR should be merged into dev/gfdl.

@Hallberg-NOAA
Copy link
Member

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/18289.

@Hallberg-NOAA Hallberg-NOAA merged commit 4c3b409 into NOAA-GFDL:dev/gfdl Feb 19, 2023
@marshallward marshallward deleted the netcdf_read_data branch May 8, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants