-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_blocks #3276
Merged
Merged
map_blocks #3276
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
b090a9e
map_block attempt 2
dcherian 3948798
Address reviews: errors, args + kwargs support.
dcherian 4f159c8
Works with datasets!
dcherian 9179f0b
remove wrong comment.
dcherian 20c5d5b
Support chunks.
dcherian b16b237
infer template.
dcherian 43ef2b7
cleanup
dcherian 5ebf738
cleanup2
dcherian 8a460bb
api.rst
dcherian 505f3f0
simple shape change error check.
dcherian fe1982f
Make test more complicated.
dcherian 066eb59
Fix for when user function doesn't set DataArray.name
dcherian 83eb310
Now _to_temp_dataset works.
dcherian 008ce29
Add whats-new
dcherian adbe48e
chunks kwarg makes no sense right now.
dcherian 924bf69
review feedback:
dcherian 8aed8e7
Support nondim coords in make_meta.
dcherian d0797f6
Add Dataset.unify_chunks
dcherian 599b70a
Merge branch 'master' into map_blocks_2
dcherian 765ca5d
doc updates.
dcherian 180bbf2
Merge remote-tracking branch 'upstream/master' into map_blocks_2
dcherian f0de1db
minor.
dcherian 1251a5d
update comment.
dcherian 47a0e39
More complicated test dataset. Tests fail :X
dcherian fa44d32
Don't know why compute is needed.
dcherian a6e84ef
work with DataArray nondim coords.
dcherian c28b402
fastpath unify_chunks
dcherian 1694d03
comment.
dcherian cf04ec8
much improved tests.
dcherian 3e9db26
Change args, kwargs syntax.
dcherian 20fdde6
Add dataset, dataarray methods.
dcherian 22e9c4e
api.rst
dcherian b145787
docstrings.
dcherian f600c4a
Fix unify_chunks.
dcherian 4af5a67
Move assert_chunks_equal to xarray.testing.
dcherian 3ca4b7b
minor changes.
dcherian 3345d25
Better error handling when inferring returned object
dcherian 54c77dd
wip
dcherian fb1ff0b
Docstrings + nicer error message.
dcherian bad0855
wip
dcherian 291e6e6
better to_array
dcherian b31537c
remove unify_chunks in map_blocks + better tests.
dcherian 72e7913
typing for unify_chunks
dcherian 0a6bbed
address more review comments.
dcherian 210987e
more unify_chunks tests.
dcherian 582e0d5
Just use dask.core.utils.meta_from_array
dcherian d0fd87e
get tests working. assert_equal needs a lot of fixing.
dcherian 875264a
more unify_chunks test.
dcherian 0f03e37
assert_chunks_equal fixes.
dcherian 8175d73
copy over meta_from_array.
dcherian 6ab8737
minor fixes.
dcherian 08c41b9
raise chunks error earlier and test for map_blocks raising chunk error
dcherian 76bc23c
fix.
dcherian 49d3899
Type annotations
ae53b85
py35 compat
f6dfb12
make sure unify_chunks does not compute.
dcherian c73eda1
Make tests functional by call compute before assert_equal
dcherian 8ad882b
Update whats-new
dcherian aa4ea00
Merge remote-tracking branch 'upstream/master' into map_blocks_2
dcherian 3cda5ac
Work with attributes.
dcherian 49969a7
Support attrs and name changes.
dcherian 6faf79e
more assert_equal
dcherian 47baf76
test changing coord attribute
dcherian 1295499
Merge remote-tracking branch 'upstream/master' into map_blocks_2
dcherian ce252f2
fix whats new
dcherian 50ae13f
rework tests to use fixtures (kind of)
dcherian cdcf221
more review changes.
dcherian f167537
cleanup
dcherian 4390f73
more review feedback.
dcherian c936557
fix unify_chunks.
dcherian e34aafe
Merge remote-tracking branch 'upstream/master' into map_blocks_2
dcherian 2c7938a
read dask_array_compat :)
dcherian 08ed873
Dask 1.2.0 compat.
dcherian 67663aa
Merge remote-tracking branch 'upstream/master' into map_blocks_2
crusaderky 99d61fc
documentation polish
crusaderky 687689e
make_meta reflow
crusaderky f588cb6
cosmetic
crusaderky d476e2f
polish
crusaderky 26a6a0d
Fix tests
crusaderky 6491753
isort
crusaderky b227bea
isort
crusaderky 2a41906
Add func call to docstrings.
dcherian File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
try: | ||
import dask | ||
import dask.array | ||
from dask.highlevelgraph import HighLevelGraph | ||
|
||
except ImportError: | ||
pass | ||
|
||
import itertools | ||
import numpy as np | ||
|
||
from .dataarray import DataArray | ||
from .dataset import Dataset | ||
|
||
|
||
def map_blocks(func, obj, *args, **kwargs): | ||
""" | ||
Apply a function to each chunk of a DataArray or Dataset. | ||
|
||
Parameters | ||
---------- | ||
func: callable | ||
User-provided function that should accept DataArrays corresponding to one chunk. | ||
obj: DataArray, Dataset | ||
Chunks of this object will be provided to 'func'. The function must not change | ||
shape of the provided DataArray. | ||
args, kwargs: | ||
Passed on to func. | ||
|
||
Returns | ||
------- | ||
DataArray | ||
|
||
See Also | ||
-------- | ||
dask.array.map_blocks | ||
""" | ||
|
||
def _wrapper(func, obj, to_array, args, kwargs): | ||
if to_array: | ||
# this should be easier | ||
obj = obj.to_array().squeeze().drop("variable") | ||
|
||
result = func(obj, *args, **kwargs) | ||
|
||
if not isinstance(result, type(obj)): | ||
raise ValueError("Result is not the same type as input.") | ||
if result.shape != obj.shape: | ||
raise ValueError("Result does not have the same shape as input.") | ||
|
||
return result | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# if not isinstance(obj, DataArray): | ||
# raise ValueError("map_blocks can only be used with DataArrays at present.") | ||
|
||
if isinstance(obj, DataArray): | ||
dataset = obj._to_temp_dataset() | ||
to_array = True | ||
else: | ||
dataset = obj | ||
to_array = False | ||
|
||
dataset_dims = list(dataset.dims) | ||
|
||
graph = {} | ||
gname = "map-%s-%s" % (dask.utils.funcname(func), dask.base.tokenize(dataset)) | ||
|
||
# map dims to list of chunk indexes | ||
# If two different variables have different chunking along the same dim | ||
# .chunks will raise an error. | ||
chunks = dataset.chunks | ||
ichunk = {dim: range(len(chunks[dim])) for dim in chunks} | ||
# mapping from chunk index to slice bounds | ||
chunk_index_bounds = {dim: np.cumsum((0,) + chunks[dim]) for dim in chunks} | ||
|
||
# iterate over all possible chunk combinations | ||
for v in itertools.product(*ichunk.values()): | ||
chunk_index_dict = dict(zip(dataset_dims, v)) | ||
|
||
# this will become [[name1, variable1], | ||
# [name2, variable2], | ||
# ...] | ||
# which is passed to dict and then to Dataset | ||
data_vars = [] | ||
coords = [] | ||
|
||
for name, variable in dataset.variables.items(): | ||
# make a task that creates tuple of (dims, chunk) | ||
if dask.is_dask_collection(variable.data): | ||
var_dask_keys = variable.__dask_keys__() | ||
|
||
# recursively index into dask_keys nested list to get chunk | ||
chunk = var_dask_keys | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for dim in variable.dims: | ||
chunk = chunk[chunk_index_dict[dim]] | ||
|
||
task_name = ("tuple-" + dask.base.tokenize(chunk),) + v | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
graph[task_name] = (tuple, [variable.dims, chunk]) | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
# numpy array with possibly chunked dimensions | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# index into variable appropriately | ||
subsetter = dict() | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for dim in variable.dims: | ||
if dim in chunk_index_dict: | ||
which_chunk = chunk_index_dict[dim] | ||
subsetter[dim] = slice( | ||
chunk_index_bounds[dim][which_chunk], | ||
chunk_index_bounds[dim][which_chunk + 1], | ||
) | ||
|
||
subset = variable.isel(subsetter) | ||
task_name = (name + dask.base.tokenize(subset),) + v | ||
graph[task_name] = (tuple, [subset.dims, subset]) | ||
|
||
# this task creates dict mapping variable name to above tuple | ||
if name in dataset.data_vars: | ||
data_vars.append([name, task_name]) | ||
if name in dataset.coords: | ||
coords.append([name, task_name]) | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
graph[(gname,) + v] = ( | ||
_wrapper, | ||
func, | ||
(Dataset, (dict, data_vars), (dict, coords), dataset.attrs), | ||
to_array, | ||
args, | ||
kwargs, | ||
) | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
final_graph = HighLevelGraph.from_collections(name, graph, dependencies=[dataset]) | ||
|
||
if isinstance(obj, DataArray): | ||
result = DataArray( | ||
dask.array.Array( | ||
final_graph, name=gname, chunks=obj.data.chunks, meta=obj.data._meta | ||
), | ||
dims=obj.dims, | ||
coords=obj.coords, | ||
) | ||
|
||
return result |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations please