-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hazard classmethod for loading xarray Datasets #507
Add Hazard classmethod for loading xarray Datasets #507
Conversation
* Add classmethod `from_raster_netcdf` that loads a dataset from a NetCDF file and reads the appropriate data into a Hazard instance. * Add a new test file for testing this method. This is still WIP.
Great! Also regarding the open_mfdatset, I guess we don't need to include it internally. The user can do it themselves and just pass the xr.Dataset into the climada function. Then he's sure that the different files are concatenated correctly. |
@timschmi95 Thanks for the feedback! I agree that concatenating multiple data files within this method would be a bit of overkill, as we would probably need to handle too many possible cases.
How would such a dataset look like, exactly? Isn't that – in CLIMADA terms – a "vector" dataset, that one would read with a |
Make `Hazard.from_raster_netcdf` handle cases where coordinates with other names than dimensions are supposed to be read, and where coordinates are not flattened. The first is achieved by adding another method parameter so that users may specify *dimensions* and *coordinates* separately. The second is achieved by stacking the entire dataset, which also applies to possibly multi-dimensional coordinates, instead of flattening only the respective array. Add a new test case to cover the new capabilities.
This avoids relying on user input which dimensions to use.
This fixes some linter complaints.
d05aa43
to
b110a94
Compare
…-raster-like-data-from-netcdf-file
Move more complicated test case down
np.testing.assert_array_equal already checks for matching array shapes.
Add capability of reading all "optional" Hazard data through the `data_vars` parameter. "Optional" means that default values can be provided and hence the data is not strictly necessary. Changes: * Add possibility to read `date`, `event_id`, `event_name`, and `frequency` from data. * Add possibility to supply `haz_type` and `unit` through method parameters. * Provide defaults for all optionals. * Update docstrings. * Update tests.
* Make 'fraction' an optional argument and move it into `data_vars`. * Update tests.
I think this is coming along quite nicely. For now, I think the method is capable of the most important things. The examples and unit tests should make the capabilities somewhat clear. Would somebody like to review? What's missing are probably some integration tests based on real data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without going into details, I have a few suggestions. Mainly using Pathlib in the tests, and reflecting on the problem of sparse data one more time before finalizing. Otherwise, cool stuff!
climada/hazard/base.py
Outdated
"""By default, use the numpy array representation of data""" | ||
return x.values | ||
|
||
def strict_positive_int_accessor(x: xr.DataArray, report_key: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def strict_positive_int_accessor(x: xr.DataArray, report_key: str): | |
def _strict_positive_int_accessor(x: xr.DataArray, report_key: str): |
This hopefully decreases code complexity slightly.
(Hopefully) improve the readability of the method and its signature. * Move default coordinate keys and attribute keys out of the `Hazard` class by defining constants. * Rename `time` to `event` in `coordinate_vars` argument, as CLIMADA operates on events. * Rename `load_data_or_default` to `load_from_xarray_or_return_default`. * Rename `identifier` to `default_key`. * Update docstrings and extend comments.
Skip test for now as it is unclear how this can be tested
climada/hazard/base.py
Outdated
crs : str, optional | ||
Identifier for the coordinate reference system to use. Defaults to | ||
``EPSG:4326`` (WGS 84), defined by ``climada.util.constants.DEF_CRS``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition!
I would just define two-three different .crs as input and apply them to the reader. Check then if the .crs is correct in the hazard object. Probably one can test:
- The default: EPSG:4326
- A projected one: "'+proj=cea +lat_0=52.112866 +lon_0=5.150162 +units=m'"
- Mercator "EPSG:3857"
Aside from that: the docstring should probably make it clearer what format the identifier of the crs must be in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…-raster-like-data-from-netcdf-file
Type: Hazard.unit was set instead of Hazard.units. This was not catched by the test because they were consistent.
@chahank @emanuel-schmid I think this is finished 🙌 Could you give it a final look? |
…-raster-like-data-from-netcdf-file
In Hazard.from_raster_xarray, promote coordinates to dimensions if they have a single value. This enables loading datasets with lower dimensions as long as the "missing" dimensions are specified as coordinates. Update docstring and tests.
…ata-from-netcdf-file' of https://github.com/CLIMADA-project/climada_python into 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file
Changes proposed in this PR:
from_raster_xarray
that loads a dataset from a file and reads the appropriate data into a Hazard instance.This PR fixes issue #487
To-Do List
Open Questions
Do we want to read hazard type and units from dataset attributes or is passing them as parameters enough?
Reading them from data is difficult because they should be supplied as xarray Attributes and we currently cannot handle those separately
Should a list of input files be allowed (using
xarray.open_mfdataset
internally)?Probably not, users can do that. There are too many possibilities for combining datasets
Is the interface sensible, especially w.r.t. the keyword-only arguments?
Should we rename this method to
from_raster_xarray
, since we technically can load anythingxarray
is able to load? This would mean we should also check if loading.grib
files worksYes, we renamed it. But
.grib
files are not checked yet.Pull Request Checklist
develop
)