Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix optimize for chunked DataArray #4432

Merged
merged 2 commits into from
Sep 17, 2020

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Sep 17, 2020

Previously we generated in invalidate Dask task graph, becuase the lines
removed here dropped keys that were referenced elsewhere in the task
graph. The original implementation had a
comment indicating that this was to cull:

results = {k: v for k, v in results.items() if k[0] == name} # cull

Just spot-checking things, I think we're OK here though. Something like
dask.visualize(arr[[0]], optimize_graph=True) indicates that we're OK.

  • Closes dask.optimize on xarray objects #3698
  • Tests added
  • Passes isort . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

TomAugspurger and others added 2 commits September 17, 2020 15:15
Previously we generated in invalidate Dask task graph, becuase the lines
removed here dropped keys that were referenced elsewhere in the task
graph. The original implementation had a
comment indicating that this was to cull:
https://github.com/pydata/xarray/blame/502a988ad5b87b9f3aeec3033bf55c71272e1053/xarray/core/variable.py#L384

Just spot-checking things, I think we're OK here though. Something like
`dask.visualize(arr[[0]], optimize_graph=True)` indicates that we're OK.

Closes pydata#3698
@max-sixty
Copy link
Collaborator

Thanks @TomAugspurger !
Merging so we can get this into 0.16.1

@max-sixty max-sixty merged commit 9a8a62b into pydata:master Sep 17, 2020
@keewis
Copy link
Collaborator

keewis commented Sep 18, 2020

with the merge we have a test failure:

_______________________ test_persist_Dataset[<lambda>1] ________________________

persist = <function <lambda> at 0x7fe8d1646048>

    @pytest.mark.parametrize(
        "persist", [lambda x: x.persist(), lambda x: dask.persist(x)[0]]
    )
    def test_persist_Dataset(persist):
        ds = Dataset({"foo": ("x", range(5)), "bar": ("x", range(5))}).chunk()
        ds = ds + 1
        n = len(ds.foo.data.dask)
    
        ds2 = persist(ds)
    
>       assert len(ds2.foo.data.dask) == 1
E       AssertionError: assert 2 == 1
E        +  where 2 = len(<dask.highlevelgraph.HighLevelGraph object at 0x7fe8c45f4518>)
E        +    where <dask.highlevelgraph.HighLevelGraph object at 0x7fe8c45f4518> = dask.array<add, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>.dask
E        +      where dask.array<add, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray> = <xarray.DataArray 'foo' (x: 5)>\ndask.array<add, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>\nDimensions without coordinates: x.data
E        +        where <xarray.DataArray 'foo' (x: 5)>\ndask.array<add, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>\nDimensions without coordinates: x = <xarray.Dataset>\nDimensions:  (x: 5)\nDimensions without coordinates: x\nData variables:\n    foo      (x) int64 dask.array<chunksize=(5,), meta=np.ndarray>\n    bar      (x) int64 dask.array<chunksize=(5,), meta=np.ndarray>.foo

/home/vsts/work/1/s/xarray/tests/test_dask.py:943: AssertionError

does anyone know why that happens?

@max-sixty
Copy link
Collaborator

Hmmm. It did pass before the merge from master:

Uploading image.png…

max-sixty added a commit that referenced this pull request Sep 18, 2020
@TomAugspurger
Copy link
Contributor Author

Huh, I'm able to reproduce locally. Looking into it now.

@TomAugspurger
Copy link
Contributor Author

Might be best to proceed with #4434 for now. I'll need to give this a bit of thought.

@max-sixty
Copy link
Collaborator

Might be best to proceed with #4434 for now. I'll need to give this a bit of thought.

OK, as you wish, I'll merge if that passes.

But your change did pass before the merge. Could it be a conflict (in functionality, not git) with recent changes on master?

max-sixty added a commit that referenced this pull request Sep 18, 2020
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Sep 18, 2020 via email

@keewis
Copy link
Collaborator

keewis commented Sep 18, 2020

it did, the first failing commit is 381aaf8

Edit: sorry, you're right, the first commit should have also failed. Not sure why that happened, and we can't really check because the build logs were already deleted.

dcherian added a commit to dcherian/xarray that referenced this pull request Oct 9, 2020
…pagate-attrs

* 'propagate-attrs' of github.com:dcherian/xarray: (22 commits)
  silence sphinx warnings about broken rst (pydata#4448)
  Xarray open_mfdataset with engine Zarr (pydata#4187)
  Fix release notes formatting (pydata#4443)
  fix typo in io.rst (pydata#4250)
  Fix typo (pydata#4181)
  Fix release notes typo
  New whatsnew section
  Add notes re doctests (pydata#4440)
  Fixed dask.optimize on datasets (pydata#4438)
  Release notes for 0.16.1 (pydata#4435)
  Small updates to How-to-release + lint (pydata#4436)
  Fix doctests (pydata#4439)
  add a ci for doctests (pydata#4437)
  preserve original dimension, coordinate and variable order in ``concat`` (pydata#4419)
  Fix for h5py deepcopy issues (pydata#4426)
  Keep the original ordering of the coordinates (pydata#4409)
  Clearer Vectorized Indexing example (pydata#4433)
  Revert "Fix optimize for chunked DataArray (pydata#4432)" (pydata#4434)
  Fix optimize for chunked DataArray (pydata#4432)
  fix doc dataarray to netcdf (pydata#4424)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dask.optimize on xarray objects
3 participants