Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryo 84: Refactor tutorials to use earthaccess and xarray #91

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

andypbarrett
Copy link
Collaborator

This PR updates the SMAP tutorials to use earthaccess and xarray

Methods are included to load SMAP L3 data into xarray.DataTree and add coordinates.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

github-actions bot commented Jan 7, 2025

Binder 👈 Launch a binder notebook on this branch for commit 1715ba0

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit a095d54

Binder 👈 Launch a binder notebook on this branch for commit 1ceeb9b

Copy link

review-notebook-app bot commented Jan 7, 2025

View / edit / reply to this conversation on ReviewNB

flamingbear commented on 2025-01-07T16:28:07Z
----------------------------------------------------------------

xarray names dimensions phony_dim0, phony_dim1, etc.

I actually think this is HDF5 and not xarray.


Copy link

review-notebook-app bot commented Jan 7, 2025

View / edit / reply to this conversation on ReviewNB

flamingbear commented on 2025-01-07T16:28:08Z
----------------------------------------------------------------

Line #1.    dt = xr.open_datatree(filelist[0], phony_dims='sort')

In my notebook I don't need phony_dims keyword, but I might have a different env, and not using h5netcdf. I hope that doesn't really change how the datatrees behave.


asteiker commented on 2025-02-10T16:31:46Z
----------------------------------------------------------------

I also do not need phony_dims and in fact this gives me an error when I use it. I am working in the Openscapes 2i2c hub with xarray v2024.11.0

Copy link

review-notebook-app bot commented Jan 7, 2025

View / edit / reply to this conversation on ReviewNB

flamingbear commented on 2025-01-07T16:28:08Z
----------------------------------------------------------------

I think here you can just access the dataset object and skip the update. see below

dt["Soil_Moisture_Retrieval_Data_AM"].ds = \
  dt["Soil_Moisture_Retrieval_Data_AM"].ds.rename(
      {
          'phony_dim_0': 'y', 
          'phony_dim_1': 'x', 
          'phony_dim_2': 'igbp_class'
      }

 


asteiker commented on 2025-02-10T16:37:27Z
----------------------------------------------------------------

Confirmed this updated using Matt's code w/o dt.update()

Copy link

review-notebook-app bot commented Jan 7, 2025

View / edit / reply to this conversation on ReviewNB

flamingbear commented on 2025-01-07T16:28:09Z
----------------------------------------------------------------

Of course I just said that you didn't need to do update but I don't actually see the Dimensions in the root doing it my way. And I just verified that behavior. It's probably a bug. So if you want the dims recognized I think you'll have to use the update function.


asteiker commented on 2025-02-10T16:40:39Z
----------------------------------------------------------------

I see the updated Dimensions in the root w/o using dt.update() so it works for me at least. Scratch that, I was looking at dimensions in those particular groups, not at the root level.

asteiker commented on 2025-02-10T16:43:48Z
----------------------------------------------------------------

Somehow I edited Matt's comment..?! And now I can't go back to re-edit. Confirming that it does look like dt.update() is needed to update dimensions at the root level.

@flamingbear
Copy link
Member

@andypbarrett Unfortunately I think you are doing it properly for the moment, but this looks like the right direction. If I get a chance I'll see if I can create a stripped down example and file a bug on the repo.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. One last thing. I noticed the versions for smap in the download data notebooks were 008 and I couldn't get any data without bumping them to 009.

Copy link
Member

I also do not need phony_dims and in fact this gives me an error when I use it. I am working in the Openscapes 2i2c hub with xarray v2024.11.0


View entire conversation on ReviewNB

Copy link
Member

Confirmed this updated using Matt's code w/o dt.update()


View entire conversation on ReviewNB

Copy link
Member

I see the updated Dimensions in the root w/o using dt.update() so it works for me at least.


View entire conversation on ReviewNB

Copy link
Member

Somehow I edited Matt's comment..?! And now I can't go back to re-edit. Confirming that it does look like dt.update() is needed to update dimensions at the root level.


View entire conversation on ReviewNB

@@ -0,0 +1,2724 @@
{
Copy link
Member

@asteiker asteiker Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I received an error in this code block, pointing to line 6:

`ValueError: Invalid location identifier (invalid identifier type to function)


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error results from using h5py. We can now use xarray.DataTree so this notebook is largely obsolete. I wonder if we should remove it and combine any relevant info into other notebooks.

@@ -0,0 +1,2724 @@
{
Copy link
Member

@asteiker asteiker Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the same error, this time for root.visititems(get_vars):

ValueError: Invalid location identifier (invalid identifier type to function)


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

@@ -0,0 +1,2339 @@
{
Copy link
Member

@asteiker asteiker Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we leave this rasterio access pattern out of the tutorials for the moment. As I recall I was playing around with regridding.

@@ -0,0 +1,2339 @@
{
Copy link
Member

@asteiker asteiker Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually have this file in my smap_data directory. But I don't even see that directory in your branch: https://github.com/nsidc/NSIDC-Data-Tutorials/tree/CRYO-84/notebooks/SMAP Maybe this is missing?


Reply via ReviewNB

@@ -0,0 +1,2339 @@
{
Copy link
Member

@asteiker asteiker Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding the file to my smap_data directory, I still get this error:

OSError: [Errno -101] NetCDF: HDF error: '/home/jovyan/other-repos/NSIDC-Data-Tutorials/notebooks/SMAP/smap_data/SMAP_L4_SM_gph_20150331T013000_Vv7032_001.h5'

I'm using xarray v2024.11.0 and rioxarray v0.17.0


Reply via ReviewNB

Copy link

review-notebook-app bot commented Feb 10, 2025

View / edit / reply to this conversation on ReviewNB

asteiker commented on 2025-02-10T21:23:57Z
----------------------------------------------------------------

I also see that h5py is not defined when I tried to run this code block. I don't want to lose this work but maybe this content should be moved elsewhere out of this notebook to be refined in a future iteration?


@asteiker
Copy link
Member

@andypbarrett These are great additions to the existing SMAP notebooks, and appreciate the updates to point to the cloud copies. One big-picture question after reviewing the notebooks: Do these make sense to add as numbered tutorials that flow with the existing 01_download..., 02_read..., 03_smap_quality_flags tutorials? Potentially iterations of the 02_read tutorial?

@andypbarrett
Copy link
Collaborator Author

I was thinking the same thing about the numbered tutorials. I think the SMAP as DataTree tutorial can replace 01 and 02. However, the time series section of 02 requires open_dataset because datatree does not have a multi-file option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants