-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement interp for interpolating between chunks of data (dask) #4155
Merged
Merged
Changes from 9 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
62c6385
Implement interp for interpolating between chunks of data (dask)
f6f7dad
do not forget extra points at the end
b0d8a5f
add tests
1a31457
add whats-new comment
9933c73
fix isort / black
cea826b
typo
44bbedf
update pull number
067b7f3
fix github pep8 warnigns
c47a1d5
fix isort
7d505a1
clearer arguments in _dask_aware_interpnd
423b36d
typo
85ff539
fix for datetimelike index
6e9b50e
chunked interpolation does not work for high order interpolation (qua…
c63636f
Merge branch 'upstream' into chunked_interp
86cb592
fix whats new
5e26a4e
remove a useless import
3ca6e6d
use Variable instead of InexVariable
a131b21
avoid some list to tuple conversion
67d2b36
black fix
f485958
more comments to explain _compute_chunks
42f8a3b
For orthogonal linear- and nearest-neighbor interpolation, the scalar…
ec3c400
better detection of Advanced interpolation
e231954
implement support of unsorted interpolation destination
061f5a8
rework the tests
623cb0b
fix for datetime index (bug introduced with unsorted destination)
b66d123
Variable is cheaber that DataArray
e211127
add warning if unsorted
e610268
simplify _compute_chunks
7547d56
add ghosts point in order to make quadratic and cubic method work in…
fd936dd
black
24f9460
forgot to remove an exception in test_upsample_interpolate_dask
dd2f273
fix filtering out-of-order warning
49bdefa
use extrapolate to check external points
d280867
Revert "add ghosts point in order to make quadratic and cubic method …
aeb7be1
Complete rewrite using blockwise
3c7d8c6
Merge branch 'upstream' into chunked_interp
0bc35d2
update whats-new.rst
0d5f618
reduce the diff
290a075
more decomposition of orthogonal interpolation
3f8718e
simplify _dask_aware_interpnd a little
562d5aa
fix dask interp when chunks are not aligned
62f059c
continue simplifying _dask_aware_interpnd
3d4d45c
update whats-new.rst
b60cddf
clean tests
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
import warnings | ||
from functools import partial | ||
from numbers import Number | ||
from typing import Any, Callable, Dict, Hashable, Sequence, Union | ||
from typing import Any, Callable, Dict, Hashable, List, Sequence, Union | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
@@ -544,13 +544,11 @@ def _get_valid_fill_mask(arr, dim, limit): | |
) <= limit | ||
|
||
|
||
def _assert_single_chunk(var, axes): | ||
def _single_chunk(var, axes): | ||
for axis in axes: | ||
if len(var.chunks[axis]) > 1 or var.chunks[axis][0] < var.shape[axis]: | ||
raise NotImplementedError( | ||
"Chunking along the dimension to be interpolated " | ||
"({}) is not yet supported.".format(axis) | ||
) | ||
return False | ||
return True | ||
|
||
|
||
def _localize(var, indexes_coords): | ||
|
@@ -706,22 +704,76 @@ def interp_func(var, x, new_x, method, kwargs): | |
if isinstance(var, dask_array_type): | ||
import dask.array as da | ||
|
||
_assert_single_chunk(var, range(var.ndim - len(x), var.ndim)) | ||
chunks = var.chunks[: -len(x)] + new_x[0].shape | ||
drop_axis = range(var.ndim - len(x), var.ndim) | ||
new_axis = range(var.ndim - len(x), var.ndim - len(x) + new_x[0].ndim) | ||
return da.map_blocks( | ||
_interpnd, | ||
var, | ||
x, | ||
new_x, | ||
# easyer, and allows advanced interpolation | ||
if _single_chunk(var, range(var.ndim - len(x), var.ndim)): | ||
chunks = var.chunks[: -len(x)] + new_x[0].shape | ||
drop_axis = range(var.ndim - len(x), var.ndim) | ||
new_axis = range(var.ndim - len(x), var.ndim - len(x) + new_x[0].ndim) | ||
return da.map_blocks( | ||
_interpnd, | ||
var, | ||
x, | ||
new_x, | ||
func, | ||
kwargs, | ||
dtype=var.dtype, | ||
chunks=chunks, | ||
new_axis=new_axis, | ||
drop_axis=drop_axis, | ||
) | ||
|
||
current_dims = [_x.name for _x in x] | ||
|
||
# number of non interpolated dimensions | ||
nconst = var.ndim - len(x) | ||
|
||
# chunks x | ||
x = tuple( | ||
da.from_array(_x, chunks=chunks) | ||
for _x, chunks in zip(x, var.chunks[nconst:]) | ||
) | ||
|
||
# duplicate the ghost cells of the array in the interpolated dimensions | ||
var_with_ghost, x_with_ghost = _add_interp_ghost(var, x, nconst) | ||
|
||
# compute final chunks | ||
target_dims = set.union(*[set(_x.dims) for _x in new_x]) | ||
if target_dims - set(current_dims): | ||
raise NotImplementedError( | ||
"Advanced interpolation is not implemented with chunked dimension" | ||
) | ||
new_x = tuple([_x.set_dims(current_dims) for _x in new_x]) | ||
total_chunks = _compute_chunks(x, x_with_ghost, new_x) | ||
final_chunks = var.chunks[: -len(x)] + tuple(total_chunks) | ||
|
||
# chunks new_x | ||
new_x = tuple(da.from_array(_x, chunks=total_chunks) for _x in new_x) | ||
|
||
# reshape x_with_ghost | ||
# TODO: remove it (see _dask_aware_interpnd) | ||
x_with_ghost = da.meshgrid(*x_with_ghost, indexing="ij") | ||
|
||
# compute on chunks | ||
res = da.map_blocks( | ||
_dask_aware_interpnd, | ||
var_with_ghost, | ||
func, | ||
kwargs, | ||
len(x_with_ghost), | ||
*x_with_ghost, | ||
*new_x, | ||
dtype=var.dtype, | ||
chunks=chunks, | ||
new_axis=new_axis, | ||
drop_axis=drop_axis, | ||
chunks=final_chunks, | ||
) | ||
|
||
# reshape res and remove empty chunks | ||
# TODO: remove it by using drop_axis and new_axis in map_blocks | ||
res = res.squeeze() | ||
new_chunks = tuple( | ||
[tuple([chunk for chunk in chunks if chunk > 0]) for chunks in res.chunks] | ||
) | ||
res = res.rechunk(new_chunks) | ||
return res | ||
|
||
return _interpnd(var, x, new_x, func, kwargs) | ||
|
||
|
@@ -751,3 +803,76 @@ def _interpnd(var, x, new_x, func, kwargs): | |
# move back the interpolation axes to the last position | ||
rslt = rslt.transpose(range(-rslt.ndim + 1, 1)) | ||
return rslt.reshape(rslt.shape[:-1] + new_x[0].shape) | ||
|
||
|
||
def _dask_aware_interpnd(var, func: Callable[..., Any], kwargs: Any, nx: int, *arrs): | ||
"""Wrapper for `_interpnd` allowing dask array to be used in `map_blocks` | ||
|
||
The first `nx` arrays in `arrs` are original coordinates, the rest are destination coordinate | ||
Currently this need original coordinate to be full arrays (meshgrid) | ||
|
||
TODO: find a way to use 1d coordinates | ||
""" | ||
from .dataarray import DataArray | ||
|
||
_old_x, _new_x = arrs[:nx], arrs[nx:] | ||
|
||
# reshape x (TODO REMOVE) | ||
old_x = tuple( | ||
[ | ||
np.moveaxis(tmp, dim, -1)[tuple([0] * (len(tmp.shape) - 1))] | ||
for dim, tmp in enumerate(_old_x) | ||
] | ||
) | ||
|
||
new_x = tuple([DataArray(_x) for _x in _new_x]) | ||
|
||
return _interpnd(var, old_x, new_x, func, kwargs) | ||
|
||
|
||
def _add_interp_ghost(var, x, nconst: int): | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" Duplicate the ghost cells of the array (values and coordinates)""" | ||
import dask.array as da | ||
|
||
bnd = {i: "none" for i in range(len(var.shape))} | ||
depth = {i: 0 if i < nconst else 1 for i in range(len(var.shape))} | ||
|
||
var_with_ghost = da.overlap.overlap(var, depth=depth, boundary=bnd) | ||
|
||
x_with_ghost = tuple( | ||
da.overlap.overlap(_x, depth={0: 1}, boundary={0: "none"}) for _x in x | ||
) | ||
return var_with_ghost, x_with_ghost | ||
|
||
|
||
def _compute_chunks(x, x_with_ghost, new_x): | ||
"""Compute equilibrated chunks of new_x | ||
|
||
TODO: This only works if new_x is a set of 1d coordinate | ||
more general function is needed for advanced interpolation with chunked dimension | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add more doc for this function? It is difficult to follow the logic for me... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it better now ? |
||
""" | ||
chunks_end = [np.cumsum(sizes) - 1 for _x in x for sizes in _x.chunks] | ||
chunks_end_with_ghost = [ | ||
np.cumsum(sizes) - 1 for _x in x_with_ghost for sizes in _x.chunks | ||
] | ||
total_chunks = [] | ||
for dim, ce in enumerate(zip(chunks_end, chunks_end_with_ghost)): | ||
l_new_x_ends: List[np.ndarray] = [] | ||
for iend, iend_with_ghost in zip(*ce): | ||
|
||
arr = np.moveaxis(new_x[dim].data, dim, -1) | ||
arr = arr[tuple([0] * (len(arr.shape) - 1))] | ||
pums974 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
n_no_ghost = (arr <= x[dim][iend]).sum() | ||
n_ghost = (arr <= x_with_ghost[dim][iend_with_ghost]).sum() | ||
|
||
equil = np.ceil(0.5 * (n_no_ghost + n_ghost)).astype(int) | ||
|
||
l_new_x_ends.append(equil) | ||
|
||
new_x_ends = np.array(l_new_x_ends) | ||
# do not forget extra points at the end | ||
new_x_ends[-1] = len(arr) | ||
chunks = new_x_ends[0], *(new_x_ends[1:] - new_x_ends[:-1]) | ||
total_chunks.append(tuple(chunks)) | ||
return total_chunks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to give the squeezing axis explicitly?
What happens if the original array already has a size-one dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're right, it's probably not safe, hence the TODO.
But at the time I didn't manage to use
drop_axis
andnew_axis
...I'll give an another try tomorrow.