Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray support #23

Merged
merged 28 commits into from
Oct 20, 2020
Merged

Conversation

elbeejay
Copy link
Contributor

@elbeejay elbeejay commented Sep 6, 2020

Updated (9/28/2020)

This PR makes CubeVariable an xarray-accessor rather than a subclass of numpy.ndarray. So changes are all over the place, but are relatively minor and largely preserve the prior API.

The changes

  1. In io.py instead of reading the NetCDF4 as a dictionary, we load it as an xarray.DataSet. This retains the benefits of the original scheme ("lazy" loading without reading the file into memory). In this way we can keep the .read() function to give the user the option of fully reading the dataset into memory whenever they wish (presumably prior to running a serious computation). What is a bit clumsy right now is the definition of the "coords" which are the dimensions of the data arrays. This might be better addressed in pyDeltaRCM (related to Add ability to save sediment discharge DeltaRCM/pyDeltaRCM#99) as there we could just create a better netCDF file to read (and for other datasets encourage people to preprocess or prep their datasets per the expected standard).

  2. Turns out NetCDF4 and HDF5 are very similar (see this and this). The built-in xarray loading supports most NetCDF4 and HDF5 files so now we use that and have therefore added support for HDF5 files. A small HDF5 example file has been added to allow us to unit test this functionality (as well as generalize DeltaMetrics by incorporating example Landsat data).

  3. The developers of xarray have been pretty consistent (back from when it was xray even), that they do not support or encourage subclassing (their issues 728, 3959, and 3980 to name a few). Instead they suggest using an "accessor" to extend the functionality of xarray objects. So now CubeVariable is an accessor to xarray allowing us to add custom functions to CubeVariable, but retain an xarray object as CubeVariable.data.

  4. Rudimentary 3D plotting support added for show_cube() via PyVista. Returns an error if pyvista is not installed.

Conclusions

This passes the unit tests as well as the documentation tests which suggests that this change does not break everything. But I do suspect that some of the stratigraphy functions are a bit less stable than they were before due to changes in the way some of the dimensions (think self.x or self._y etc) are handled. Started tracking things down, but realized that we might want to convert components of plan.py and section.py to be xarray accessors too (such as SectionVariable). So stopped messing with things and have settled for the PR as it is now. Implementing this might just be an incremental step towards true xarray compatibility, details regarding coordinates and dimensions are incomplete as mentioned above.

@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2020

Codecov Report

Merging #23 into develop will increase coverage by 1.21%.
The diff coverage is 94.36%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #23      +/-   ##
===========================================
+ Coverage    87.25%   88.46%   +1.21%     
===========================================
  Files           10        7       -3     
  Lines         1106     1118      +12     
===========================================
+ Hits           965      989      +24     
+ Misses         141      129      -12     
Impacted Files Coverage Δ
deltametrics/utils.py 54.16% <ø> (+12.50%) ⬆️
deltametrics/io.py 86.58% <87.87%> (+3.97%) ⬆️
deltametrics/plot.py 84.90% <100.00%> (ø)
deltametrics/section.py 96.96% <100.00%> (ø)
deltametrics/strat.py 87.83% <100.00%> (+1.04%) ⬆️
deltametrics/sample_data/__init__.py
deltametrics/_version.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef0fe0f...2e7fdbc. Read the comment docs.

@amoodie amoodie mentioned this pull request Sep 15, 2020
5 tasks
@elbeejay elbeejay marked this pull request as ready for review September 29, 2020 00:58
@elbeejay elbeejay mentioned this pull request Oct 4, 2020
Copy link
Contributor

@amoodie amoodie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @elbeejay. This was a huge effort and you have pushed the project way forward with it, thank you.

My only major suggestion is adding a __getitem__ to the CubeVariable that points to the xarray data. This allows the API for accessing data to stay the same, and would remove a lot of the needed .data calls (for example in the mask codes).

deltametrics/cube.py Outdated Show resolved Hide resolved
deltametrics/strat.py Outdated Show resolved Hide resolved
deltametrics/section.py Show resolved Hide resolved
deltametrics/cube.py Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Oct 16, 2020

Codecov Report

Merging #23 into develop will increase coverage by 0.98%.
The diff coverage is 93.10%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #23      +/-   ##
===========================================
+ Coverage    88.56%   89.55%   +0.98%     
===========================================
  Files           11        8       -3     
  Lines         1251     1254       +3     
===========================================
+ Hits          1108     1123      +15     
+ Misses         143      131      -12     
Impacted Files Coverage Δ
deltametrics/utils.py 68.57% <ø> (+8.57%) ⬆️
deltametrics/io.py 86.58% <87.87%> (+3.97%) ⬆️
deltametrics/plot.py 84.90% <100.00%> (ø)
deltametrics/section.py 96.96% <100.00%> (ø)
deltametrics/strat.py 87.22% <100.00%> (+0.44%) ⬆️
deltametrics/_version.py
deltametrics/sample_data/__init__.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe80c01...94a8a66. Read the comment docs.

@elbeejay
Copy link
Contributor Author

I think between your commits @amoodie and some of the smaller commits I made today we've made this into a better PR. In the future I'm sure there are many more xarray related methods/PRs that'll come up, but I think this is enough to get us started on the path of "easy" dimensioning for our plots/computations...

@amoodie
Copy link
Contributor

amoodie commented Oct 20, 2020

Cool, thanks for cleaning up. I agree, there will be more to do, but this is a huge first step. Thanks!

@amoodie amoodie merged commit 57da003 into sandpiper-toolchain:develop Oct 20, 2020
@elbeejay elbeejay deleted the xarray-support branch October 30, 2020 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants