-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interpolate_na: Add max_gap support. #3302
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good!
I had a cursory check through the algo; a couple more scenarios tested would be good, in particular gaps in the middle of the values
Any thoughts on maxgap
? limit_gap
? Neither clicks that well!
Thanks @max-sixty I've updated the tests.
I pulled |
Ah, agree we should align. I'm really not keen on that name but yes on balance; unless they're open to changing |
As per the pandas issue, sounds like |
👍 |
Nice feature. |
Thanks @stefraynaud . I'm having trouble figuring out defining the length of a gap in the irregular coordinate case. e.g.
What is the length of these three gaps given that xarray doesn't have any understanding of grids? |
I think using locations rather than counts would be great, but harder and doesn't have to be part of this PR. In the example above, it looks like 1 is aligned with 6 and 4 with 10, so the gap in locations along the y dimension would be 4? |
OK added test and now raises error for irregularly spaced coordinates. I agree that this should be good for now. |
xarray/tests/test_missing.py
Outdated
[0, 2, 5, 6, 7, 8, 10, 12, 14], | ||
[[6, 6, 6, 0, 2, 2, 0, 3, 3]], | ||
marks=pytest.mark.xfail( | ||
reason="max_gap with irregularly spaced coordinate." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if I'm being slow—is the max_gap
measuring the locations (i.e. on the index), or the number of values? The example below seems to be counting two values, rather than measuring the space between the locations.
If that's right, why does it matter than the coordinates are irregular (or even non-monotonic)?
@dcherian In your example, as said @max-sixty, the middle gap has a length of I'll have a look the code, maybe for a new PR after this one. |
The thing I find weird is that for
the center gap's length =
@stefraynaud I don't have time to work on this now. Please feel free to modify this and open a new PR. You could try to push to this branch but I'm not sure it will work. |
IIUC, and please correct me if I'm wrong, the pandas version counts points rather than the distance between locations. Ideally we'd be able to do both, but even if we can only have one working correctly, that would be v good |
yes, I think you are right. I was thinking that it would be nice to have the number-of-nan-points and gap-length metrics converge for uniformly spaced coordinates but I don't think that's possible in any sensible way. |
Thanks @dnowacki-usgs that's a nice test. I think the right fix is to make Things left to do (or at least add xfail tests + errors):
|
* upstream/master: minor lint tweaks (pydata#3429) Hack around pydata#3440 (pydata#3442) Update Terminology page to account for multidimensional coordinates (pydata#3410) Use cftime master for upstream-dev build (pydata#3439) MAGA (Make Azure Green Again) (pydata#3436) Test that Dataset and DataArray resampling are identical (pydata#3412) Avoid multiplication DeprecationWarning in rasterio backend (pydata#3428) Sync with latest version of cftime (v1.0.4) (pydata#3430) Add cftime git tip to upstream-dev + temporarily pin cftime (pydata#3431)
* upstream/master: Escaping dtypes (pydata#3444) Html repr (pydata#3425)
Done for now. Ready for final review / testing. |
* upstream/master: __dask_tokenize__ (pydata#3446) Type check sentinel values (pydata#3472) Fix typo in docstring (pydata#3474) fix test suite warnings re `drop` (pydata#3460) Fix integrate docs (pydata#3469) Fix leap year condition in monthly means example (pydata#3464) Hypothesis tests for roundtrip to & from pandas (pydata#3285) unpin cftime (pydata#3463) Cleanup whatsnew (pydata#3462) enable xr.ALL_DIMS in xr.dot (pydata#3424) Merge stable into master (pydata#3457) upgrade black verison to 19.10b0 (pydata#3456) Remove outdated code related to compatibility with netcdftime (pydata#3450) Remove deprecated behavior from dataset.drop docstring (pydata#3451) jupyterlab dark theme (pydata#3443) Drop groups associated with nans in group variable (pydata#3406) Allow ellipsis (...) in transpose (pydata#3421) Another groupby.reduce bugfix. (pydata#3403) add icomoon license (pydata#3448)
This could use another round of testing / review |
Thanks for all your work @dcherian! Did a quick test with some real-world timeseries data I've been wanting to use with |
I'm going to merge this. Happy to make any other changes. |
Great, thanks @dcherian ! |
* upstream/master: Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515)
* upstream/master: (22 commits) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515) format indexing.rst code with black (pydata#3511) add missing pint integration tests (pydata#3508) DOC: update bottleneck repo url (pydata#3507) add drop_sel, drop_vars, map to api.rst (pydata#3506) remove syntax warning (pydata#3505) ...
* master: (24 commits) Tweaks to release instructions (pydata#3555) Clarify conda environments for new contributors (pydata#3551) Revert to dev version 0.14.1 whatsnew (pydata#3547) sparse option to reindex and unstack (pydata#3542) Silence sphinx warnings (pydata#3516) Numpy 1.18 support (pydata#3537) tweak whats-new. (pydata#3540) small simplification of rename from pydata#3532 (pydata#3539) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) ...
black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API@dnowacki-usgs : can you look this over and test it out if you have time? feel free to push any changes to this branch :)