Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hazard classmethod for loading xarray Datasets #507

Conversation

peanutfun
Copy link
Member

@peanutfun peanutfun commented Jul 5, 2022

Changes proposed in this PR:

  • Add classmethod from_raster_xarray that loads a dataset from a file and reads the appropriate data into a Hazard instance.
  • Add appropriate tests.

This PR fixes issue #487

To-Do List

  • Add hazard type and unit (only to signature. They are not read from the data)
  • Read hazard frequency
  • Read event names
  • Add logger messages
  • Fix as many linter issues as possible

Open Questions

  • Do we want to read hazard type and units from dataset attributes or is passing them as parameters enough?

    Reading them from data is difficult because they should be supplied as xarray Attributes and we currently cannot handle those separately

  • Should a list of input files be allowed (using xarray.open_mfdataset internally)?

    Probably not, users can do that. There are too many possibilities for combining datasets

  • Is the interface sensible, especially w.r.t. the keyword-only arguments?

  • Should we rename this method to from_raster_xarray, since we technically can load anything xarray is able to load? This would mean we should also check if loading .grib files works

    Yes, we renamed it. But .grib files are not checked yet.

Pull Request Checklist

  • Correct target branch selected (if unsure, select develop)
  • Source branch up-to-date with target branch
  • Docs updated
  • Tests updated
  • Tests passing
  • No new linter issues

* Add classmethod `from_raster_netcdf` that loads a dataset from a
  NetCDF file and reads the appropriate data into a Hazard instance.
* Add a new test file for testing this method.

This is still WIP.
@timschmi95
Copy link
Collaborator

Great!
I think it would be good to also make the method work for data that is on a different grid, but has lat/lon coordinates as 2D arrays. Then, instead of creating a meshgrid, one can directly take the lat/lon coordinates and flatten them.
I could add this functionality, should I just push it to the branch or make a sub branch?

Also regarding the open_mfdatset, I guess we don't need to include it internally. The user can do it themselves and just pass the xr.Dataset into the climada function. Then he's sure that the different files are concatenated correctly.

climada/hazard/base.py Outdated Show resolved Hide resolved
climada/hazard/base.py Outdated Show resolved Hide resolved
climada/hazard/base.py Outdated Show resolved Hide resolved
climada/hazard/base.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member Author

peanutfun commented Jul 6, 2022

@timschmi95 Thanks for the feedback! I agree that concatenating multiple data files within this method would be a bit of overkill, as we would probably need to handle too many possible cases.

I think it would be good to also make the method work for data that is on a different grid, but has lat/lon coordinates as 2D arrays. Then, instead of creating a meshgrid, one can directly take the lat/lon coordinates and flatten them.

How would such a dataset look like, exactly? Isn't that – in CLIMADA terms – a "vector" dataset, that one would read with a from_vector method?

peanutfun added 3 commits July 6, 2022 12:10
Make `Hazard.from_raster_netcdf` handle cases where coordinates with
other names than dimensions are supposed to be read, and where
coordinates are not flattened. The first is achieved by adding another
method parameter so that users may specify *dimensions* and
*coordinates* separately. The second is achieved by stacking the entire
dataset, which also applies to possibly multi-dimensional coordinates,
instead of flattening only the respective array.

Add a new test case to cover the new capabilities.
@peanutfun peanutfun force-pushed the 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file branch from d05aa43 to b110a94 Compare July 7, 2022 10:18
Move more complicated test case down
np.testing.assert_array_equal already checks for matching array shapes.
Add capability of reading all "optional" Hazard data through the
`data_vars` parameter. "Optional" means that default values can be
provided and hence the data is not strictly necessary. Changes:

* Add possibility to read `date`, `event_id`, `event_name`, and
  `frequency` from data.
* Add possibility to supply `haz_type` and `unit` through method
  parameters.
* Provide defaults for all optionals.
* Update docstrings.
* Update tests.
* Make 'fraction' an optional argument and move it into `data_vars`.
* Update tests.
@peanutfun peanutfun changed the title Add Hazard classmethod for loading NetCDF file Add Hazard classmethod for loading xarray Datasets Jul 27, 2022
@peanutfun
Copy link
Member Author

I think this is coming along quite nicely. For now, I think the method is capable of the most important things. The examples and unit tests should make the capabilities somewhat clear. Would somebody like to review? What's missing are probably some integration tests based on real data.

@peanutfun peanutfun marked this pull request as ready for review July 27, 2022 15:21
@peanutfun peanutfun requested a review from chahank July 27, 2022 15:21
Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without going into details, I have a few suggestions. Mainly using Pathlib in the tests, and reflecting on the problem of sparse data one more time before finalizing. Otherwise, cool stuff!

climada/hazard/base.py Outdated Show resolved Hide resolved
climada/hazard/base.py Outdated Show resolved Hide resolved
climada/hazard/base.py Show resolved Hide resolved
climada/hazard/base.py Outdated Show resolved Hide resolved
"""By default, use the numpy array representation of data"""
return x.values

def strict_positive_int_accessor(x: xr.DataArray, report_key: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def strict_positive_int_accessor(x: xr.DataArray, report_key: str):
def _strict_positive_int_accessor(x: xr.DataArray, report_key: str):

climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
This hopefully decreases code complexity slightly.
(Hopefully) improve the readability of the method and its signature.

* Move default coordinate keys and attribute keys out of the `Hazard`
  class by defining constants.
* Rename `time` to `event` in `coordinate_vars` argument, as CLIMADA
  operates on events.
* Rename `load_data_or_default` to `load_from_xarray_or_return_default`.
* Rename `identifier` to `default_key`.
* Update docstrings and extend comments.
Skip test for now as it is unclear how this can be tested
Comment on lines 457 to 459
crs : str, optional
Identifier for the coordinate reference system to use. Defaults to
``EPSG:4326`` (WGS 84), defined by ``climada.util.constants.DEF_CRS``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chahank I implemented the crs argument in ccec87f, but I'm unsure how to test this argument properly. Do you have an idea or suggestion?

Copy link
Member

@chahank chahank Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition!

I would just define two-three different .crs as input and apply them to the reader. Check then if the .crs is correct in the hazard object. Probably one can test:

  • The default: EPSG:4326
  • A projected one: "'+proj=cea +lat_0=52.112866 +lon_0=5.150162 +units=m'"
  • Mercator "EPSG:3857"

Aside from that: the docstring should probably make it clearer what format the identifier of the crs must be in.

Copy link
Member Author

@peanutfun peanutfun Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 84b0795 and a72112c

climada/hazard/base.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member Author

@chahank @emanuel-schmid I think this is finished 🙌 Could you give it a final look?

climada/hazard/base.py Outdated Show resolved Hide resolved
In Hazard.from_raster_xarray, promote coordinates to dimensions if they
have a single value. This enables loading datasets with lower dimensions
as long as the "missing" dimensions are specified as coordinates.

Update docstring and tests.
…ata-from-netcdf-file' of https://github.com/CLIMADA-project/climada_python into 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file
@peanutfun peanutfun merged commit e8b25f4 into develop Oct 14, 2022
@emanuel-schmid emanuel-schmid deleted the 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file branch October 28, 2022 13:57
@peanutfun peanutfun mentioned this pull request Nov 7, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add classmethod to Hazard for reading raster-like data from NetCDF file
4 participants