Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index variables loaded from dask can be computed twice #1533

Closed
shoyer opened this issue Aug 28, 2017 · 6 comments · Fixed by #7729
Closed

Index variables loaded from dask can be computed twice #1533

shoyer opened this issue Aug 28, 2017 · 6 comments · Fixed by #7729

Comments

@shoyer
Copy link
Member

shoyer commented Aug 28, 2017

as reported by @crusaderky in #1522

@stale
Copy link

stale bot commented Oct 2, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Oct 2, 2019
@crusaderky
Copy link
Contributor

Still valid in xarray 0.13

import xarray
import numpy
import dask.array

def kernel():
    print("Kernel invoked!")
    return numpy.array([100, 200])

data = dask.array.Array(name='foo', dask={('foo', 0): (kernel, )}, chunks=((2,),), dtype=float)
ds = xarray.Dataset(coords={'z': ('z', data)})

output:

Kernel invoked!
Kernel invoked!

@stale stale bot removed the stale label Oct 2, 2019
@dcherian
Copy link
Contributor

dcherian commented Oct 2, 2019

Nice example!

I tracked this down to

  1. as_variable being called when extracting the index. This seems sensible.

    xarray/xarray/core/merge.py

    Lines 413 to 420 in 21705e6

    def merge_data_and_coords(data, coords, compat="broadcast_equals", join="outer"):
    """Used in Dataset.__init__."""
    objs = [data, coords]
    explicit_coords = coords.keys()
    indexes = dict(extract_indexes(coords))
    return merge_core(
    objs, compat, join, explicit_coords=explicit_coords, indexes=indexes
    )
  2. as_variable being called again in the expand_variable_dicts call in merge_core (should be avoided):

    xarray/xarray/core/merge.py

    Lines 493 to 497 in 21705e6

    coerced = coerce_pandas_values(objs)
    aligned = deep_align(
    coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    )
    expanded = expand_variable_dicts(aligned)

We need to replace the appropriate objects in aligned with IndexVariables from indexes if indexes is not None.

@shoyer
Copy link
Member Author

shoyer commented Oct 2, 2019 via email

@dcherian
Copy link
Contributor

dcherian commented Oct 2, 2019

Haha I thought it was already in!

@dcherian
Copy link
Contributor

dcherian commented Oct 4, 2019

Adding

if not isinstance(coords, Frozen):
    coords.update(indexes)

before the merge_core call fixes this. Not sure if its right.

xarray/xarray/core/merge.py

Lines 455 to 462 in 283b4fe

def merge_data_and_coords(data, coords, compat="broadcast_equals", join="outer"):
"""Used in Dataset.__init__."""
objects = [data, coords]
explicit_coords = coords.keys()
indexes = dict(_extract_indexes_from_coords(coords))
return merge_core(
objects, compat, join, explicit_coords=explicit_coords, indexes=indexes
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants