Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with dask 2021.02.0 #4884

Merged
merged 3 commits into from
Feb 11, 2021

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Feb 9, 2021

Closes #4860
Reverts #4873

Restore compatibility with dask 2021.02.0 by avoiding improper assumptions on the implementation details of da.Array.__dask_postpersist__().

This PR does not align xarray to the new dask collection spec (dask/dask#7093), as I just realized that Datasets violate the rule of having all dask keys with the same name if they contain more than one dask variable - and cannot do otherwise. So I have to change the dask collection spec again to accommodate them.

@crusaderky crusaderky requested a review from keewis February 9, 2021 17:43
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. I don't know a lot about dask, though, so someone else might need to look at this.

# TODO We're wasting a lot of key-level work. We should write a fast
# variant of HighLevelGraph.cull() that works at layer level
# only.
dsk2 = dsk.cull(keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add tests for this code path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will once dask/dask#7203 is out. I can remove the code path for now if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe that's safer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the HLG and reworked the whole thing

@dcherian dcherian mentioned this pull request Feb 11, 2021
6 tasks
@crusaderky crusaderky merged commit 2a34bfb into pydata:master Feb 11, 2021
@crusaderky crusaderky deleted the dask_postpersist branch February 11, 2021 18:33
@keewis keewis linked an issue Feb 11, 2021 that may be closed by this pull request
dcherian added a commit to dcherian/xarray that referenced this pull request Feb 12, 2021
* upstream/master: (24 commits)
  Compatibility with dask 2021.02.0 (pydata#4884)
  Ensure maximum accuracy when encoding and decoding cftime.datetime values (pydata#4758)
  Fix `bounds_error=True` ignored with 1D interpolation (pydata#4855)
  add a drop_conflicts strategy for merging attrs (pydata#4827)
  update pre-commit hooks (mypy) (pydata#4883)
  ensure warnings cannot become errors in assert_ (pydata#4864)
  update pre-commit hooks (pydata#4874)
  small fixes for the docstrings of swap_dims and integrate (pydata#4867)
  Modify _encode_datetime_with_cftime for compatibility with cftime > 1.4.0 (pydata#4871)
  vélin (pydata#4872)
  don't skip the doctests CI (pydata#4869)
  fix da.pad example for numpy 1.20 (pydata#4865)
  temporarily pin dask (pydata#4873)
  Add units if "unit" is in the attrs. (pydata#4850)
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Feb 17, 2021
* upstream/master:
  FIX: h5py>=3 string decoding (pydata#4893)
  Update matplotlib's canonical (pydata#4919)
  Adding vectorized indexing docs (pydata#4711)
  Allow fsspec URLs in open_(mf)dataset (pydata#4823)
  Fix typos in example notebooks (pydata#4908)
  pre-commit autoupdate CI (pydata#4906)
  replace the ci-trigger action with a external one (pydata#4905)
  Update area_weighted_temperature.ipynb (pydata#4903)
  hide the decorator from the test traceback (pydata#4900)
  Sort backends (pydata#4886)
  Compatibility with dask 2021.02.0 (pydata#4884)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persisting datasets containing arrays with a single chunk fails ⚠️ Nightly upstream-dev CI failed ⚠️
3 participants