Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support netCDF load+save on dataset-like objects as well as filepaths. #5214

Merged
merged 10 commits into from
May 19, 2023

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Mar 27, 2023

Rebased version of #5024

To echo comment on #5024 :

some such feature will be important to the proposed solutions for #4994
But we should remain open to changing how this is implemented.

NOTE: now targets main branch, instead of previous feature-branch FEATURE_xarray_readwrite.
This is because we are no longer planning to host the majority of the 'bridge' code in the Iris repo, but in a separate 'ncdata' repo : see #4994

Todo:

@pp-mo pp-mo force-pushed the loadsave_ncfiles_2 branch from f503e9d to 6459591 Compare March 27, 2023 09:41
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Patch coverage: 81.81% and project coverage change: +0.02 🎉

Comparison is base (b035094) 89.32% compared to head (ce94993) 89.34%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5214      +/-   ##
==========================================
+ Coverage   89.32%   89.34%   +0.02%     
==========================================
  Files          89       89              
  Lines       22392    22413      +21     
  Branches     5375     5380       +5     
==========================================
+ Hits        20002    20026      +24     
+ Misses       1640     1639       -1     
+ Partials      750      748       -2     
Impacted Files Coverage Δ
lib/iris/fileformats/netcdf/saver.py 89.02% <67.85%> (+0.02%) ⬆️
lib/iris/fileformats/netcdf/_thread_safe_nc.py 83.55% <70.00%> (+1.03%) ⬆️
lib/iris/__init__.py 88.88% <83.33%> (+0.36%) ⬆️
lib/iris/fileformats/__init__.py 84.37% <100.00%> (-0.48%) ⬇️
lib/iris/fileformats/cf.py 85.65% <100.00%> (+0.11%) ⬆️
lib/iris/fileformats/netcdf/loader.py 80.44% <100.00%> (+0.07%) ⬆️
lib/iris/io/__init__.py 83.21% <100.00%> (+2.05%) ⬆️
lib/iris/io/format_picker.py 83.72% <100.00%> (+2.26%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pp-mo
Copy link
Member Author

pp-mo commented May 10, 2023

Update

Rebased onto latest main
still WIP

@pp-mo pp-mo force-pushed the loadsave_ncfiles_2 branch from b7ac505 to 81dfd5b Compare May 10, 2023 16:24
@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Update

I discovered that this feature now has a problem with output to a already-open dataset (one owned by the caller) whencompute=True (which is the default!) :

  • since we added support for distributed schedulers in Lazy netcdf saves #5191, lazy streaming requires the original file to be closed -- because the dask workers doing the streaming may now be separate processes, not always threads, which means they must each (re-)open the dataset for writing -- which they can't if the original caller still has it open
  • this simply isn't compatible with saving to an open dataset, because iris.save doesn't have the right to close the dataset (since I don't think netCDF4 supports "re-opening" a dataset, and closing will invalidate any netCDF4 objects held by user).

So that means that, in most cases, you can't just iris.save(cubes, my_dataset).
You must instead saves = iris.save(cubes, my_dataset, compute=False); my_dataset.close(); saves.compute()

Following offline discussion with @bjlittle @trexfeathers

  • We can either

    • forbid this usage, or
    • error out except in those specific cases which happen to be OK -- e.g. when dataset is not a real file-based dataset (as intended for 4994 ), or if all data is real.
  • we decided that the usecase is so specialist that it will probably only be used for Xarray bridge #4994 support. Hence we should ban compute=True operation saving to a provided dataset, since the feature is still useable (just less convenient in a rare usecase) and it is the simplest solution, both to implement and document.

@pp-mo pp-mo marked this pull request as ready for review May 15, 2023 15:12
@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Ping @ESadek-MO
Where you going to review this (:crossed_fingers: :pray: ) ?
I think it is at last ready to go !

@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Ping @ESadek-MO Where you going to review this (crossed_fingers pray ) ? I think it is at last ready to go !

P.S. #5212 will be a follow-up, which should complete essential support for #4994.
But that should be a lot simpler than this.

Copy link
Contributor

@ESadek-MO ESadek-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pp-mo, looks good, just a couple soft suggestions!

lib/iris/io/__init__.py Outdated Show resolved Hide resolved
lib/iris/fileformats/__init__.py Outdated Show resolved Hide resolved
lib/iris/io/__init__.py Show resolved Hide resolved
lib/iris/tests/integration/netcdf/test_general.py Outdated Show resolved Hide resolved
@pp-mo
Copy link
Member Author

pp-mo commented May 17, 2023

Thanks @ESadek-MO
I think I addressed all those points now -- please re-review + see what you think.

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on the changes to _thread_safe_nc.py:

lib/iris/fileformats/netcdf/_thread_safe_nc.py Outdated Show resolved Hide resolved
lib/iris/fileformats/netcdf/_thread_safe_nc.py Outdated Show resolved Hide resolved
@pp-mo
Copy link
Member Author

pp-mo commented May 19, 2023

Thanks @trexfeathers
I think I've responded to all of those, now.
Please re-review !

Copy link
Contributor

@ESadek-MO ESadek-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pp-mo, all looks good now!
Thanks for the review assist @trexfeathers!

@ESadek-MO ESadek-MO merged commit 04dadb2 into SciTools:main May 19, 2023
@pp-mo
Copy link
Member Author

pp-mo commented May 19, 2023

Thanks so much, @ESadek-MO @trexfeathers !

tkknight added a commit to tkknight/iris that referenced this pull request May 21, 2023
* upstream/main:
  Updated environment lockfiles (SciTools#5332)
  Support netcdf variable emulation (SciTools#5212)
  Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214)
  minor refinement to release do-nothing script (SciTools#5326)
  update bibtex citation for v3.6.0 release (SciTools#5324)
  Whats new updates for v3.6.0 (SciTools#5323)
  Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322)
  Updated environment lockfiles (SciTools#5320)
  Bugfix 4566 (SciTools#4569)
  Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
tkknight added a commit to tkknight/iris that referenced this pull request Nov 9, 2023
* upstream/main:
  Updated environment lockfiles (SciTools#5334)
  Iris Docs: Dark theme ready (SciTools#5299)
  Updated environment lockfiles (SciTools#5332)
  Support netcdf variable emulation (SciTools#5212)
  Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214)
  minor refinement to release do-nothing script (SciTools#5326)
  update bibtex citation for v3.6.0 release (SciTools#5324)
  Whats new updates for v3.6.0 (SciTools#5323)
  Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322)
  Updated environment lockfiles (SciTools#5320)
  Bugfix 4566 (SciTools#4569)
  Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
  update release_do_nothing script (SciTools#5311)
  Restore latest Whats New files.
  Updated environment lockfiles (SciTools#5308)
  release_do_nothing script updates from v3.6.0rc0 (SciTools#5306)
  Whats new updates for v3.6.0rc0 (SciTools#5300)
  Bump scitools/workflows from 2023.04.3 to 2023.05.0 (SciTools#5303)
@pp-mo pp-mo deleted the loadsave_ncfiles_2 branch October 23, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

3 participants