Add Hazard classmethod for loading xarray Datasets #507

peanutfun · 2022-07-05T15:04:20Z

Changes proposed in this PR:

Add classmethod from_raster_xarray that loads a dataset from a file and reads the appropriate data into a Hazard instance.
Add appropriate tests.

This PR fixes issue #487

To-Do List

Add hazard type and unit (only to signature. They are not read from the data)
Read hazard frequency
Read event names
Add logger messages
Fix as many linter issues as possible

Open Questions

Do we want to read hazard type and units from dataset attributes or is passing them as parameters enough?

Reading them from data is difficult because they should be supplied as xarray Attributes and we currently cannot handle those separately
Should a list of input files be allowed (using xarray.open_mfdataset internally)?

Probably not, users can do that. There are too many possibilities for combining datasets
Is the interface sensible, especially w.r.t. the keyword-only arguments?
Should we rename this method to from_raster_xarray, since we technically can load anything xarray is able to load? This would mean we should also check if loading .grib files works

Yes, we renamed it. But .grib files are not checked yet.

Pull Request Checklist

* Add classmethod `from_raster_netcdf` that loads a dataset from a NetCDF file and reads the appropriate data into a Hazard instance. * Add a new test file for testing this method. This is still WIP.

timschmi95 · 2022-07-06T07:12:05Z

Great!
I think it would be good to also make the method work for data that is on a different grid, but has lat/lon coordinates as 2D arrays. Then, instead of creating a meshgrid, one can directly take the lat/lon coordinates and flatten them.
I could add this functionality, should I just push it to the branch or make a sub branch?

Also regarding the open_mfdatset, I guess we don't need to include it internally. The user can do it themselves and just pass the xr.Dataset into the climada function. Then he's sure that the different files are concatenated correctly.

climada/hazard/base.py

peanutfun · 2022-07-06T10:06:30Z

@timschmi95 Thanks for the feedback! I agree that concatenating multiple data files within this method would be a bit of overkill, as we would probably need to handle too many possible cases.

I think it would be good to also make the method work for data that is on a different grid, but has lat/lon coordinates as 2D arrays. Then, instead of creating a meshgrid, one can directly take the lat/lon coordinates and flatten them.

How would such a dataset look like, exactly? Isn't that – in CLIMADA terms – a "vector" dataset, that one would read with a from_vector method?

Make `Hazard.from_raster_netcdf` handle cases where coordinates with other names than dimensions are supposed to be read, and where coordinates are not flattened. The first is achieved by adding another method parameter so that users may specify *dimensions* and *coordinates* separately. The second is achieved by stacking the entire dataset, which also applies to possibly multi-dimensional coordinates, instead of flattening only the respective array. Add a new test case to cover the new capabilities.

climada/hazard/test/test_base_netcdf.py

This avoids relying on user input which dimensions to use.

This fixes some linter complaints.

…-raster-like-data-from-netcdf-file

Move more complicated test case down

np.testing.assert_array_equal already checks for matching array shapes.

Add capability of reading all "optional" Hazard data through the `data_vars` parameter. "Optional" means that default values can be provided and hence the data is not strictly necessary. Changes: * Add possibility to read `date`, `event_id`, `event_name`, and `frequency` from data. * Add possibility to supply `haz_type` and `unit` through method parameters. * Provide defaults for all optionals. * Update docstrings. * Update tests.

* Make 'fraction' an optional argument and move it into `data_vars`. * Update tests.

peanutfun · 2022-07-27T15:21:14Z

I think this is coming along quite nicely. For now, I think the method is capable of the most important things. The examples and unit tests should make the capabilities somewhat clear. Would somebody like to review? What's missing are probably some integration tests based on real data.

chahank

Without going into details, I have a few suggestions. Mainly using Pathlib in the tests, and reflecting on the problem of sparse data one more time before finalizing. Otherwise, cool stuff!

climada/hazard/base.py

chahank · 2022-07-27T15:36:36Z

climada/hazard/base.py

+            """By default, use the numpy array representation of data"""
+            return x.values
+
+        def strict_positive_int_accessor(x: xr.DataArray, report_key: str):


Suggested change

def strict_positive_int_accessor(x: xr.DataArray, report_key: str):

def _strict_positive_int_accessor(x: xr.DataArray, report_key: str):

climada/hazard/test/test_base_xarray.py

This hopefully decreases code complexity slightly.

(Hopefully) improve the readability of the method and its signature. * Move default coordinate keys and attribute keys out of the `Hazard` class by defining constants. * Rename `time` to `event` in `coordinate_vars` argument, as CLIMADA operates on events. * Rename `load_data_or_default` to `load_from_xarray_or_return_default`. * Rename `identifier` to `default_key`. * Update docstrings and extend comments.

Skip test for now as it is unclear how this can be tested

peanutfun · 2022-08-08T12:21:02Z

climada/hazard/base.py

+        crs : str, optional
+            Identifier for the coordinate reference system to use. Defaults to
+            ``EPSG:4326`` (WGS 84), defined by ``climada.util.constants.DEF_CRS``.


@chahank I implemented the crs argument in ccec87f, but I'm unsure how to test this argument properly. Do you have an idea or suggestion?

Great addition!

I would just define two-three different .crs as input and apply them to the reader. Check then if the .crs is correct in the hazard object. Probably one can test:

The default: EPSG:4326

A projected one: "'+proj=cea +lat_0=52.112866 +lon_0=5.150162 +units=m'"

Mercator "EPSG:3857"

Aside from that: the docstring should probably make it clearer what format the identifier of the crs must be in.

See 84b0795 and a72112c

climada/hazard/base.py

…-raster-like-data-from-netcdf-file

Type: Hazard.unit was set instead of Hazard.units. This was not catched by the test because they were consistent.

peanutfun · 2022-10-06T14:03:03Z

@chahank @emanuel-schmid I think this is finished 🙌 Could you give it a final look?

…-raster-like-data-from-netcdf-file

climada/hazard/base.py

In Hazard.from_raster_xarray, promote coordinates to dimensions if they have a single value. This enables loading datasets with lower dimensions as long as the "missing" dimensions are specified as coordinates. Update docstring and tests.

…ata-from-netcdf-file' of https://github.com/CLIMADA-project/climada_python into 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file

Add Hazard classmethod for loading NetCDF file

4a9a520

* Add classmethod `from_raster_netcdf` that loads a dataset from a NetCDF file and reads the appropriate data into a Hazard instance. * Add a new test file for testing this method. This is still WIP.

chahank reviewed Jul 6, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

chahank reviewed Jul 6, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

chahank reviewed Jul 6, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

chahank reviewed Jul 6, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

peanutfun added 3 commits July 6, 2022 12:10

Only display unknown coordinates in from_raster_netcdf error message

755f11e

Use CLIMADA utilities for converting datetime64

525394b

peanutfun commented Jul 6, 2022

View reviewed changes

climada/hazard/test/test_base_netcdf.py Outdated Show resolved Hide resolved

peanutfun added 4 commits July 6, 2022 18:35

Avoid redefining built-in map

cbc1dc5

Add log messages to Hazard.from_raster_netcdf

d90ca87

Extract dimension names from coordinates

939968b

This avoids relying on user input which dimensions to use.

Use lazy formatting for logger messages

b110a94

This fixes some linter complaints.

peanutfun force-pushed the 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file branch from d05aa43 to b110a94 Compare July 7, 2022 10:18

peanutfun added 9 commits July 11, 2022 11:42

Merge branch 'develop' into 487-add-classmethod-to-hazard-for-reading…

4283cf6

…-raster-like-data-from-netcdf-file

Reorganize test_base_netcdf.py

ab40d1a

Move more complicated test case down

Consolidate tests for Hazard.intensity

2a4283d

np.testing.assert_array_equal already checks for matching array shapes.

Rename from_raster_netcdf to from_raster_xarray

dac5dd5

Add more logger messages and update docstring

528b99e

Remove 'fraction' parameter from 'from_raster_xarray'

d96a278

* Make 'fraction' an optional argument and move it into `data_vars`. * Update tests.

Add examples for from_raster_xarray

f817bca

Fix type hints in from_raster_xarray

fb94fc6

peanutfun changed the title ~~Add Hazard classmethod for loading NetCDF file~~ Add Hazard classmethod for loading xarray Datasets Jul 27, 2022

peanutfun marked this pull request as ready for review July 27, 2022 15:21

peanutfun requested a review from chahank July 27, 2022 15:21

chahank requested changes Jul 27, 2022

View reviewed changes

peanutfun added 9 commits July 28, 2022 15:16

Optimize storage in CSR matrix by setting NaNs to zero

abc9d32

Make hazard type and unit required arguments

29b74f3

Preprocess dict arguments to save indentation level

7678d1f

This hopefully decreases code complexity slightly.

Let to_csr_matrix only take ndarrays as arguments

c0fb24b

Do not hard-code coordinate keys

b8c46a3

Remove superfluous newline in docstring

7a9ea42

Update docstring of from_raster_xarray

b2eea56

Add CRS argument

ccec87f

Skip test for now as it is unclear how this can be tested

peanutfun commented Aug 8, 2022

View reviewed changes

peanutfun commented Sep 29, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

peanutfun added 7 commits September 29, 2022 14:50

Merge branch 'develop' into 487-add-classmethod-to-hazard-for-reading…

c19f9ea

…-raster-like-data-from-netcdf-file

Fix bug where wrong Hazard attribute was set

07e8bf7

Type: Hazard.unit was set instead of Hazard.units. This was not catched by the test because they were consistent.

Add test for crs parameter in from_raster_xarray

84b0795

Update docstring of from_raster_xarray

a72112c

Add 'Test' prefix for test cases in test_base_xarray.py

d777785

Format test_base_xarray

d36bcc2

Fix comment in from_raster_xarray

68c1459

peanutfun mentioned this pull request Oct 7, 2022

Follow-up to #507: Lazy loading of chunked xarray Datasets #544

Closed

Merge branch 'develop' into 487-add-classmethod-to-hazard-for-reading…

2cffc9a

…-raster-like-data-from-netcdf-file

peanutfun commented Oct 13, 2022

View reviewed changes

climada/hazard/base.py Show resolved Hide resolved

peanutfun commented Oct 13, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

peanutfun added 2 commits October 14, 2022 12:34

Promote single-valued coordinates to dimensions

1ff075d

In Hazard.from_raster_xarray, promote coordinates to dimensions if they have a single value. This enables loading datasets with lower dimensions as long as the "missing" dimensions are specified as coordinates. Update docstring and tests.

Merge branch '487-add-classmethod-to-hazard-for-reading-raster-like-d…

1eea94c

…ata-from-netcdf-file' of https://github.com/CLIMADA-project/climada_python into 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file

peanutfun merged commit e8b25f4 into develop Oct 14, 2022

emanuel-schmid deleted the 487-add-classmethod-to-hazard-for-reading-raster-like-data-from-netcdf-file branch October 28, 2022 13:57

peanutfun mentioned this pull request Nov 7, 2022

Load hazard raster data lazily #578

Merged

11 tasks

peanutfun linked an issue Feb 8, 2023 that may be closed by this pull request

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Closed

peanutfun mentioned this pull request Feb 8, 2023

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hazard classmethod for loading xarray Datasets #507

Add Hazard classmethod for loading xarray Datasets #507

peanutfun commented Jul 5, 2022 •

edited

Loading

timschmi95 commented Jul 6, 2022

peanutfun commented Jul 6, 2022 •

edited

Loading

peanutfun commented Jul 27, 2022

chahank left a comment

chahank Jul 27, 2022

peanutfun Aug 8, 2022

chahank Aug 9, 2022 •

edited

Loading

peanutfun Oct 6, 2022 •

edited

Loading

peanutfun commented Oct 6, 2022

	def strict_positive_int_accessor(x: xr.DataArray, report_key: str):
	def _strict_positive_int_accessor(x: xr.DataArray, report_key: str):

Add Hazard classmethod for loading xarray Datasets #507

Add Hazard classmethod for loading xarray Datasets #507

Conversation

peanutfun commented Jul 5, 2022 • edited Loading

To-Do List

Open Questions

Pull Request Checklist

timschmi95 commented Jul 6, 2022

peanutfun commented Jul 6, 2022 • edited Loading

peanutfun commented Jul 27, 2022

chahank left a comment

Choose a reason for hiding this comment

chahank Jul 27, 2022

Choose a reason for hiding this comment

peanutfun Aug 8, 2022

Choose a reason for hiding this comment

chahank Aug 9, 2022 • edited Loading

Choose a reason for hiding this comment

peanutfun Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

peanutfun commented Oct 6, 2022

peanutfun commented Jul 5, 2022 •

edited

Loading

peanutfun commented Jul 6, 2022 •

edited

Loading

chahank Aug 9, 2022 •

edited

Loading

peanutfun Oct 6, 2022 •

edited

Loading