Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF preparation for BGFLOOD #63

Closed
rosepearson opened this issue Feb 27, 2022 · 5 comments · Fixed by #68
Closed

NetCDF preparation for BGFLOOD #63

rosepearson opened this issue Feb 27, 2022 · 5 comments · Fixed by #68
Assignees
Milestone

Comments

@rosepearson
Copy link
Owner

rosepearson commented Feb 27, 2022

This is a bit of a catch-all issue for work relating to the requirements associated with the NetCDF output from GeoFabrics (and also input to BGFLOOD).

Requirements:

  • Store each geofabric dataset in one NetCDF file with each layer as different variables.
    • Could also do multiple resolutions with each elevation/roughness set in a different group
    • Could also include forcing information in the same netCDF file if we want all inputs in a single file
  • Include spatial_ref information like geotransform and crs - see section below.
    • Done using rioxarray and write_crs() and write_transform
  • Include appropriate variable and coordinates attributes:
    • units information: See link for conventions.
    • long_name: A description of the variable/coordiante.
    • standard_name: Must be included in the NUG table.
    • vertical_datum: Non-standard attribute name to record the vertical datum of any elevation data.
  • Record the parameters of a run:
    • Data sources should be documented in the attributes (i.e. land layer x, revision y)
    • Capture the parameter information in a variable, or perhaps a json dump into a group attribute.
  • Different resolution geofabrics should be aligned and evenly divisible (to ensure that alignment)
  • DEMs and roughness should be defined over a full rectangular grid with no NaN values NaNs have since been deemed ok!

NetCDF conventions

There are standards defining the conventions for attributes in netCDF files.

  • cfconventions.org - The recommended standard
  • UCAR - A repository with links to all netCDF standards
    See CF conventions for some details about conventions around netCDF files.

image

The spatial_ref coordinate

This is where information associated with the coordinate system and projection (CRS and transform) are encoded.

image
image

Coordinate systems CF-1.6 <--> CRS

There is an optional grid mapping attribute called crs_wkt may be used to specify multiple coordinate system properties in so-called well-known text format (usually abbreviated to CRS WKT or OGC WKT) as detailed in the cfconventions.org page. With example mappings at the github page.

It looks like this information is sometimes encoded within a "spatial_ref" coordinate (see issue).
image

Python Libraries

There are various Python libraries for interacting with NetCDF files including netCDF4, xarray, and rioxarray. netCDF4 is an engine used by xarray to read and write netCDF files. xarray has some power constructs for constructing and interacting with data stored in netCDF files. rioxarray combines xarray with rasterio by providing access to the rasterio engine with the rio accessor.

xarray supports two main objects - DataArrays and DataSets. DataArrays work well for a single layer of data (possibly across many bands), and the DataSet class should be used for multiple variables that may have different dimension (i.e. different resolutions, or x,y vs time).

Other

There is potential for a translation layer between GeoFabrics and BG-FLOOD or also between the catchment generation code and either GeoFabrics and/or BG-FLOOD. This would be contained in a separate repository to either.

@rosepearson
Copy link
Owner Author

rosepearson commented Apr 12, 2022

I've been exploring how to save netCDF files out if they are datasets. At this stage I can't figure out how to encode CRS information in a Dataset vs a Dataarray such that is will be read in by rioxarray :(

I've also looked into grouped data. Waiting to hear back from @AliceHarang and @CyprienBosserelle as to if this is the expected form!

Just to note, I don't get the groups showing up in PyNcView, but I can open each group in Xarray.Info:

  • data_resolution_5.nc -
    image

    • View in QGIS
      image
  • data_resolution_10.nc -
    image

    • View in QGIS
      image
  • grouped_data.nc - this has two groups "/resolution_5" and "/resolution_10", with the data in data_resolution_5.nc and data_resolution_10.nc respectively. i.e. I was trying to give an example of a single netCDF file with multiple resolutions.

    • View in QGIS
      image

Open questions:

  • What do you the modellers think?
  • Do we want to go with grouped data, or keep them separate?
  • Should I add other attributes like ymin/ymax, xmin/xmax?
  • Is the _FillValue in the form you expect?

@CyprienBosserelle
Copy link
Collaborator

I think we need to explore and discuss the different netcdf conventions that suits the data best. An stick to the selected conventions. This is valid for both BGFlood and geofabric.

@rosepearson
Copy link
Owner Author

Yes - good point @CyprienBosserelle - and as you can see from some of my previous comments I've been having trouble establishing exactly what these are. I'm going to investigate a little more how exactly the python library rioxarray encodes CRS information - something I obviously should have done before advocating for its use. I'll remove the BG-FLOOD issue I've created if this investigation suggests we shouldn't be encoding CRS info to be encoded with the netCDG file!

It'd be great to better understand what you've been doing for your netCDF files and where you're planning to go with your netCDF files so we can hopefully make geofabrics and BG-FLOOD more similar - especially while I'm making these changes to GeoFabrics. I'm curious to understand what you are doing with adaptive grids - separate netCDF files or one netCDF file with multiple resolutions in different groups.

I'm going to set up a 30min meeting to understand what's already being done/planned in BG-FLOOD so I can adopt where it makes sense in GeoFabrics. Let me know if I should shift the proposed meeting time.

@rosepearson
Copy link
Owner Author

rosepearson commented Apr 13, 2022

Notes on using Xarray and rioxarray to read and write netCDF files.
rioxarray
This is xarray with the .rio accessor (i.e. xarray_data.rio.xx_engine_call())added for accessing the rasterio engine. It supports the xarray call .to_netcdf() for writing netCDF files, and the rasterio.rasterio.open_rasterio() call for reading in netCDF (and other) files. rioxarray makes it easy to set and access a datasets CRS and transform information. This must be 'written' however if it is to be encoded in a saved netCDF file in a standard form that can be accessed by rioxarray or other libraries or applications - like QGIS.

  • Writing CRS information - either with rioxarray.rio.write_crs(..) or rioxarray.rio.write_coordinate_system(..) depending if you just want to write the exisiting CRS or define a new one.
    • image
    • image
  • Writing transform information - with rioxarray.rio.write_transform()
    • image

Example of writing transform and CRS information in a GDAL and CF compliant manner
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants