-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ffill
, bfill
with dask when limit
is specified
#9771
Optimize ffill
, bfill
with dask when limit
is specified
#9771
Conversation
…h function, and incorporate the method parameter for the cumreduction on the push method
for more information, see https://pre-commit.ci
The tests that are failing look unrelated to the change of this PR 🤔 |
Co-authored-by: Deepak Cherian <[email protected]>
@dcherian do you think that is a good idea to add a keepdims parameter to the first and last function implemented on the duck_array_ops.py file? if I add that I could use them directly on the cumreduction with blelloch method, if not I can just create a dummy wrapper like the one that I did for the push method to receive a type parameter that is not use |
yes that sounds good to me. |
…rameter for the last and first to make it compatible with the blelloch method
for more information, see https://pre-commit.ci
…method directly, they already have the parameter
for more information, see https://pre-commit.ci
ffill
, bfill
with dask when limit
is specified
Great! Can you add a note to whats-new please? This is worth advertising. |
Hi @dcherian I think we could add this special case to try to improve the performance when the limit is equal or smaller than the chunksize on the axis (only one chunk would overlap). Is there any kind of benchmark that I can use to see if there is any real benefit from using map overlap over cumsum when the n is small? if n is not None and n <= array.chunksize[axis]:
return array.map_overlap(_push, depth={axis: (n, 0)}, n=n, axis=axis) |
I don't have one at hand. @phofl might have suggestions. Note that chunks can be non-uniform, so you'll have to check every element of that |
Yes! The highlevel_api function in https://github.com/coiled/benchmarks/blob/main/tests/geospatial/workloads/climatology.py |
thanks, I will try to run those benchmarks before modifying this PR, but from my understanding it should generate fewer tasks using the map overlap on a small N. |
Yes I believe that's why @phofl proposed that optimization. We just need to make sure to trigger it in the right circumstances. Please add a test for non-uniform chunksizes where map_overlap would give you the wrong answer |
After understanding more about the code that @phofl did, I think that the implementation works in the general case, the main issue with it is that it can load the whole array in memory when the N is equal to the size of the array (based on my understanding of how the map overlaps works) when I saw the PR I thought it was the same implementation than before, but what he did was to define the overlap length based on the limit and the previous algorithm (the one before the implementation with the cumreduction) applied a map_blocks using the push function first and later a map overlap but with a single element of overlapping instead of N which generated the issue when one or more chunks was completely empty. In summary, I think that the best is to use his algorithm when N is smaller than the chunk size of the axis, so we can avoid loading more than 2 chunks at the same time. |
doesn't In any case, how do you feel about merging this and optimizing later? I believe this is already a massive improvement. |
Yes, that condition would prevent loading all the chunks, I just wanted to mention that using that algorithm in the general case would be problematic due that it would load all the chunks at once but it would give the expected result at the end. I'm okay with merging it without the special case optimization and adding that in a further PR, but I'm also okay with adding that special case on this PR, I let you the final decision, I already have it implemented on my local. |
Ah then why don't you push what you have. |
Yes this is what I wanted to do as well (I got side-tracked over the last few days unfortunately). That is memory efficient and helps in some cases |
I didn't push it because the PR had a plan to merge the label, so I didn't want to modify the PR without your approval. And sorry phofl for not getting the idea of your original algorithm the first time that I saw your PR, the good part is that now the algorithm behaves better in all the scenarios. I'm not sure is why the tests are not running, should the conflicts be resolved first to run the tests? |
Nice work @josephnowak and @phofl |
* main: Add download stats badges (pydata#9786) Fix open_mfdataset for list of fsspec files (pydata#9785) add 'User-Agent'-header to pooch.retrieve (pydata#9782) Optimize `ffill`, `bfill` with dask when `limit` is specified (pydata#9771)
* main: (24 commits) Bump minimum versions (#9796) Namespace-aware `xarray.ufuncs` (#9776) Add prettier and pygrep hooks to pre-commit hooks (#9644) `rolling.construct`: Add `sliding_window_kwargs` to pipe arguments down to `sliding_window_view` (#9720) Bump codecov/codecov-action from 4.6.0 to 5.0.2 in the actions group (#9793) Buffer types (#9787) Add download stats badges (#9786) Fix open_mfdataset for list of fsspec files (#9785) add 'User-Agent'-header to pooch.retrieve (#9782) Optimize `ffill`, `bfill` with dask when `limit` is specified (#9771) fix cf decoding of grid_mapping (#9765) Allow wrapping `np.ndarray` subclasses (#9760) Optimize polyfit (#9766) Use `map_overlap` for rolling reductions with Dask (#9770) fix html repr indexes section (#9768) Bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.2 in the actions group (#9763) unpin array-api-strict, as issues are resolved upstream (#9762) rewrite the `min_deps_check` script (#9754) CI runs ruff instead of pep8speaks (#9759) Specify copyright holders in main license file (#9756) ...
* main: Bump minimum versions (pydata#9796) Namespace-aware `xarray.ufuncs` (pydata#9776) Add prettier and pygrep hooks to pre-commit hooks (pydata#9644) `rolling.construct`: Add `sliding_window_kwargs` to pipe arguments down to `sliding_window_view` (pydata#9720) Bump codecov/codecov-action from 4.6.0 to 5.0.2 in the actions group (pydata#9793) Buffer types (pydata#9787) Add download stats badges (pydata#9786) Fix open_mfdataset for list of fsspec files (pydata#9785) add 'User-Agent'-header to pooch.retrieve (pydata#9782) Optimize `ffill`, `bfill` with dask when `limit` is specified (pydata#9771) fix cf decoding of grid_mapping (pydata#9765) Allow wrapping `np.ndarray` subclasses (pydata#9760) Optimize polyfit (pydata#9766) Use `map_overlap` for rolling reductions with Dask (pydata#9770)
This improvement comes from the discussion on this PR #9712
whats-new.rst
api.rst