-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray support #23
Xarray support #23
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #23 +/- ##
===========================================
+ Coverage 87.25% 88.46% +1.21%
===========================================
Files 10 7 -3
Lines 1106 1118 +12
===========================================
+ Hits 965 989 +24
+ Misses 141 129 -12
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @elbeejay. This was a huge effort and you have pushed the project way forward with it, thank you.
My only major suggestion is adding a __getitem__
to the CubeVariable
that points to the xarray
data. This allows the API for accessing data to stay the same, and would remove a lot of the needed .data
calls (for example in the mask
codes).
Codecov Report
@@ Coverage Diff @@
## develop #23 +/- ##
===========================================
+ Coverage 88.56% 89.55% +0.98%
===========================================
Files 11 8 -3
Lines 1251 1254 +3
===========================================
+ Hits 1108 1123 +15
+ Misses 143 131 -12
Continue to review full report at Codecov.
|
21e5e02
to
3abb033
Compare
I think between your commits @amoodie and some of the smaller commits I made today we've made this into a better PR. In the future I'm sure there are many more |
Cool, thanks for cleaning up. I agree, there will be more to do, but this is a huge first step. Thanks! |
Updated (9/28/2020)
This PR makes
CubeVariable
anxarray
-accessor rather than a subclass ofnumpy.ndarray
. So changes are all over the place, but are relatively minor and largely preserve the prior API.The changes
In
io.py
instead of reading the NetCDF4 as a dictionary, we load it as anxarray.DataSet
. This retains the benefits of the original scheme ("lazy" loading without reading the file into memory). In this way we can keep the.read()
function to give the user the option of fully reading the dataset into memory whenever they wish (presumably prior to running a serious computation). What is a bit clumsy right now is the definition of the "coords" which are the dimensions of the data arrays. This might be better addressed inpyDeltaRCM
(related to Add ability to save sediment discharge DeltaRCM/pyDeltaRCM#99) as there we could just create a better netCDF file to read (and for other datasets encourage people to preprocess or prep their datasets per the expected standard).Turns out NetCDF4 and HDF5 are very similar (see this and this). The built-in
xarray
loading supports most NetCDF4 and HDF5 files so now we use that and have therefore added support for HDF5 files. A small HDF5 example file has been added to allow us to unit test this functionality (as well as generalize DeltaMetrics by incorporating example Landsat data).The developers of
xarray
have been pretty consistent (back from when it wasxray
even), that they do not support or encourage subclassing (their issues 728, 3959, and 3980 to name a few). Instead they suggest using an "accessor" to extend the functionality ofxarray
objects. So nowCubeVariable
is an accessor toxarray
allowing us to add custom functions toCubeVariable
, but retain anxarray
object asCubeVariable.data
.Rudimentary 3D plotting support added for
show_cube()
via PyVista. Returns an error ifpyvista
is not installed.Conclusions
This passes the unit tests as well as the documentation tests which suggests that this change does not break everything. But I do suspect that some of the stratigraphy functions are a bit less stable than they were before due to changes in the way some of the dimensions (think
self.x
orself._y
etc) are handled. Started tracking things down, but realized that we might want to convert components ofplan.py
andsection.py
to bexarray
accessors too (such asSectionVariable
). So stopped messing with things and have settled for the PR as it is now. Implementing this might just be an incremental step towards truexarray
compatibility, details regarding coordinates and dimensions are incomplete as mentioned above.