Add support for CFTimeIndex in get_clean_interp_index #3631

huard · 2019-12-16T19:57:24Z

Closes interp with long cftime coordinates raises an error #3641
Tests added
Passes black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

Related to #3349

As suggested by @spencerkclark, index values are computed as a delta with respect to 1970-01-01.

At the moment, this fails if dates fall outside of the range for nanoseconds timedeltas [ 1678 AD, 2262 AD]. Is this something we can fix ?

xarray/core/missing.py

…ype conversion. add overflow tests

…d added support for CFTimeIndex.

…ssibly lossy.

pep8speaks · 2020-01-06T18:00:33Z

Hello @huard! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-26 13:27:14 UTC

xarray/tests/test_duck_array_ops.py

spencerkclark

Thanks @huard for all your work on this. I think we're getting closer now to a better situation regarding interpolation with cftime coordinates.

xarray/core/duck_array_ops.py

xarray/core/missing.py

spencerkclark · 2020-01-09T23:19:38Z

xarray/core/duck_array_ops.py

+            array = array.astype(np.float64)  # [ns]
+        res = "ns"
+    else:
+        array = np.asarray(array).astype("timedelta64[us]").astype(np.float64)  # [us]


So the potential for loss of precision is here, but only when array comes in with dtype np.timedelta64[ns] or finer. I feel like we can guard against this; we could do this in one of two ways:

Only pass dtype object arrays to this function and write a separate function to handle np.timedelta64 arrays.

Add some logic to handle np.timedelta64 arrays more robustly in this function, and name it something that reflects that both standard library timedeltas and NumPy timedeltas can be safely passed to it.

I probably prefer (1) as a solution.

I think there are two cases where we're losing precision:

Conversion from ns to us when the time steps are < 1 us.

Conversion to float of large timedelta[us] values. For example:

In [154]: np.float(13510798882111486) - np.float(13510798882111485) Out[154]: 2.0

When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

Ah, that makes sense. Indeed I had mainly been thinking about 1. Thanks for the clear example. My apologies for being dense earlier. At least in the short term, I guess I'm still not so worried about 2, primarily because it is already an issue in the existing code (i.e. even for np.datetime64[ns] dates). So this update is not changing the behavior materially; it's just extending the approach to allow for long time ranges between cftime dates¹.

Earlier you raised a good question regarding whether there would be a benefit to keeping these values as integers as long as possible to avoid these floating point problems -- it seems like there could be in the case of differentiate and integrate -- however, that would probably require some refactoring to do the unit conversion after the operation involving the numeric times, rather than before, to avoid overflow. I'd be open to discussing this more, but I don't feel like it's necessarily a blocker here, unless you have an example that clearly shows otherwise.

Regardless, I really appreciate your careful thoughts and help with this. This has been useful discussion.

¹I guess it's also important to note that we are currently naively differencing cftime dates to produce timedeltas, which even before float conversion is not microsecond-exact:

In [1]: import cftime In [2]: cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123456) - cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123455) Out[2]: datetime.timedelta(microseconds=8)

We have worked around this before in xarray; see exact_cftime_datetime_difference.

xarray/core/duck_array_ops.py

xarray/core/missing.py

xarray/coding/cftimeindex.py

xarray/tests/test_duck_array_ops.py

spencerkclark

Thanks @huard for these latest updates. I think this is pretty close.

spencerkclark · 2020-01-18T20:58:57Z

xarray/core/duck_array_ops.py

+            array = array.astype(np.float64)  # [ns]
+        res = "ns"
+    else:
+        array = np.asarray(array).astype("timedelta64[us]").astype(np.float64)  # [us]


When we discussed about loss of precision, I was more worried about 2 than 1, hence maybe some confusion earlier. I'm concerned 2 will lead to hard to track bugs.

Ah, that makes sense. Indeed I had mainly been thinking about 1. Thanks for the clear example. My apologies for being dense earlier. At least in the short term, I guess I'm still not so worried about 2, primarily because it is already an issue in the existing code (i.e. even for np.datetime64[ns] dates). So this update is not changing the behavior materially; it's just extending the approach to allow for long time ranges between cftime dates¹.

Earlier you raised a good question regarding whether there would be a benefit to keeping these values as integers as long as possible to avoid these floating point problems -- it seems like there could be in the case of differentiate and integrate -- however, that would probably require some refactoring to do the unit conversion after the operation involving the numeric times, rather than before, to avoid overflow. I'd be open to discussing this more, but I don't feel like it's necessarily a blocker here, unless you have an example that clearly shows otherwise.

Regardless, I really appreciate your careful thoughts and help with this. This has been useful discussion.

¹I guess it's also important to note that we are currently naively differencing cftime dates to produce timedeltas, which even before float conversion is not microsecond-exact:

In [1]: import cftime In [2]: cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123456) - cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 123455) Out[2]: datetime.timedelta(microseconds=8)

We have worked around this before in xarray; see exact_cftime_datetime_difference.

xarray/core/duck_array_ops.py

xarray/tests/test_interp.py

xarray/tests/test_duck_array_ops.py

xarray/core/missing.py

xarray/tests/test_duck_array_ops.py

xarray/tests/test_missing.py

doc/whats-new.rst

xarray/tests/test_duck_array_ops.py

spencerkclark

This looks great @huard -- a couple more minor things:

Could you update the docstrings for DataArray.interpolate_na and Dataset.interpolate_na to reflect that max_gap can now be a datetime.timedelta object too?
Upstream failures are unrelated, though these Windows failures look real. I think it's likely due to imprecise cftime arithmetic. We should probably use assert_allclose instead of assert_equal for these tests.

Windows failures

2020-01-20T17:16:37.9661843Z ================================== FAILURES ===================================
2020-01-20T17:16:37.9911264Z ____ test_interpolate_na_max_gap_time_specifier[3H-<lambda>0-cftime_range] ____
2020-01-20T17:16:37.9916014Z 
2020-01-20T17:16:37.9919666Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9927724Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9929019Z Coordinates:
2020-01-20T17:16:37.9932071Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9942882Z max_gap = '3H', transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:37.9943161Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9944138Z 
2020-01-20T17:16:37.9947748Z     @requires_bottleneck
2020-01-20T17:16:37.9953395Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9953621Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9953802Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9953936Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9954062Z     )
2020-01-20T17:16:37.9954239Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9954369Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9954491Z     ):
2020-01-20T17:16:37.9954660Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9954787Z         expected = transform(
2020-01-20T17:16:37.9954917Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9955081Z         )
2020-01-20T17:16:37.9955208Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9955370Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9956735Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:37.9957510Z E       
2020-01-20T17:16:37.9957762Z E       Differing values:
2020-01-20T17:16:37.9957918Z E       L
2020-01-20T17:16:37.9958076Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9958298Z E       R
2020-01-20T17:16:37.9958457Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9958590Z 
2020-01-20T17:16:37.9958782Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9970494Z ____ test_interpolate_na_max_gap_time_specifier[3H-<lambda>1-cftime_range] ____
2020-01-20T17:16:37.9970809Z 
2020-01-20T17:16:37.9972168Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9978870Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9983759Z Coordinates:
2020-01-20T17:16:37.9983942Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9984137Z max_gap = '3H', transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:37.9984270Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9984375Z 
2020-01-20T17:16:37.9984492Z     @requires_bottleneck
2020-01-20T17:16:37.9984644Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9985158Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9985367Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9985534Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9985659Z     )
2020-01-20T17:16:37.9985823Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9985951Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9986072Z     ):
2020-01-20T17:16:37.9986236Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9986370Z         expected = transform(
2020-01-20T17:16:37.9986538Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9986668Z         )
2020-01-20T17:16:37.9986795Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9986970Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9987102Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:37.9987622Z E       
2020-01-20T17:16:37.9987837Z E       
2020-01-20T17:16:37.9987978Z E       Differing data variables:
2020-01-20T17:16:37.9988126Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:37.9988319Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:37.9988444Z 
2020-01-20T17:16:37.9988610Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9988799Z _ test_interpolate_na_max_gap_time_specifier[max_gap1-<lambda>0-cftime_range] _
2020-01-20T17:16:37.9988982Z 
2020-01-20T17:16:37.9989126Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9989276Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9989452Z Coordinates:
2020-01-20T17:16:37.9989600Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9989770Z max_gap = numpy.timedelta64(3,'h')
2020-01-20T17:16:37.9989957Z transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:37.9990106Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9990229Z 
2020-01-20T17:16:37.9990403Z     @requires_bottleneck
2020-01-20T17:16:37.9990551Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9990740Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9991073Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9991510Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9991678Z     )
2020-01-20T17:16:37.9991809Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9991939Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9992101Z     ):
2020-01-20T17:16:37.9992243Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9992589Z         expected = transform(
2020-01-20T17:16:37.9992761Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9992893Z         )
2020-01-20T17:16:37.9993064Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9993236Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9993376Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:37.9993536Z E       
2020-01-20T17:16:37.9993671Z E       Differing values:
2020-01-20T17:16:37.9993962Z E       L
2020-01-20T17:16:37.9994128Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9994270Z E       R
2020-01-20T17:16:37.9994404Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9994513Z 
2020-01-20T17:16:37.9994736Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:37.9994902Z _ test_interpolate_na_max_gap_time_specifier[max_gap1-<lambda>1-cftime_range] _
2020-01-20T17:16:37.9995011Z 
2020-01-20T17:16:37.9995137Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:37.9995463Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:37.9995586Z Coordinates:
2020-01-20T17:16:37.9995871Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:37.9996025Z max_gap = numpy.timedelta64(3,'h')
2020-01-20T17:16:37.9996144Z transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:37.9996314Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:37.9996419Z 
2020-01-20T17:16:37.9996535Z     @requires_bottleneck
2020-01-20T17:16:37.9996659Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:37.9996878Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:37.9997010Z     @pytest.mark.parametrize(
2020-01-20T17:16:37.9997175Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:37.9997293Z     )
2020-01-20T17:16:37.9998011Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:37.9998222Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:37.9998361Z     ):
2020-01-20T17:16:37.9998504Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:37.9998680Z         expected = transform(
2020-01-20T17:16:37.9998829Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:37.9999016Z         )
2020-01-20T17:16:37.9999542Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:37.9999700Z >       assert_equal(actual, expected)
2020-01-20T17:16:37.9999888Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:38.0000044Z E       
2020-01-20T17:16:38.0000198Z E       
2020-01-20T17:16:38.0000340Z E       Differing data variables:
2020-01-20T17:16:38.0000534Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0000684Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0000806Z 
2020-01-20T17:16:38.0000980Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:38.0001330Z _ test_interpolate_na_max_gap_time_specifier[max_gap2-<lambda>0-cftime_range] _
2020-01-20T17:16:38.0001436Z 
2020-01-20T17:16:38.0001591Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:38.0001829Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0001949Z Coordinates:
2020-01-20T17:16:38.0002115Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:38.0002238Z max_gap = Timedelta('0 days 03:00:00')
2020-01-20T17:16:38.0002374Z transform = <function <lambda> at 0x000001EB4C167E58>
2020-01-20T17:16:38.0002537Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:38.0002639Z 
2020-01-20T17:16:38.0002755Z     @requires_bottleneck
2020-01-20T17:16:38.0002918Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:38.0003047Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:38.0003199Z     @pytest.mark.parametrize(
2020-01-20T17:16:38.0003328Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:38.0003446Z     )
2020-01-20T17:16:38.0003609Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:38.0003735Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:38.0003851Z     ):
2020-01-20T17:16:38.0004009Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:38.0004132Z         expected = transform(
2020-01-20T17:16:38.0004346Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:38.0004501Z         )
2020-01-20T17:16:38.0004625Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:38.0004780Z >       assert_equal(actual, expected)
2020-01-20T17:16:38.0004919Z E       AssertionError: Left and right DataArray objects are not equal
2020-01-20T17:16:38.0005088Z E       
2020-01-20T17:16:38.0005209Z E       Differing values:
2020-01-20T17:16:38.0005323Z E       L
2020-01-20T17:16:38.0005475Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0005604Z E       R
2020-01-20T17:16:38.0005725Z E           array([nan,  1.,  2.,  3.,  4.,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0005866Z 
2020-01-20T17:16:38.0005987Z xarray\tests\test_missing.py:568: AssertionError
2020-01-20T17:16:38.0006112Z _ test_interpolate_na_max_gap_time_specifier[max_gap2-<lambda>1-cftime_range] _
2020-01-20T17:16:38.0006261Z 
2020-01-20T17:16:38.0006381Z da_time = <xarray.DataArray (t: 11)>
2020-01-20T17:16:38.0006502Z array([nan,  1.,  2., nan, nan,  5., nan, nan, nan, nan, 10.])
2020-01-20T17:16:38.0006657Z Coordinates:
2020-01-20T17:16:38.0006780Z   * t        (t) object 2001-01-01 00:00:00 ... 2001-01-01 10:00:00
2020-01-20T17:16:38.0006899Z max_gap = Timedelta('0 days 03:00:00')
2020-01-20T17:16:38.0007059Z transform = <function <lambda> at 0x000001EB4C167EE8>
2020-01-20T17:16:38.0007598Z time_range_func = <function cftime_range at 0x000001EB453B0B88>
2020-01-20T17:16:38.0007768Z 
2020-01-20T17:16:38.0007993Z     @requires_bottleneck
2020-01-20T17:16:38.0008175Z     @pytest.mark.parametrize("time_range_func", [pd.date_range, xr.cftime_range])
2020-01-20T17:16:38.0008326Z     @pytest.mark.parametrize("transform", [lambda x: x, lambda x: x.to_dataset(name="a")])
2020-01-20T17:16:38.0008506Z     @pytest.mark.parametrize(
2020-01-20T17:16:38.0008651Z         "max_gap", ["3H", np.timedelta64(3, "h"), pd.to_timedelta("3H")]
2020-01-20T17:16:38.0008838Z     )
2020-01-20T17:16:38.0008988Z     def test_interpolate_na_max_gap_time_specifier(
2020-01-20T17:16:38.0009133Z         da_time, max_gap, transform, time_range_func
2020-01-20T17:16:38.0009307Z     ):
2020-01-20T17:16:38.0009453Z         da_time["t"] = time_range_func("2001-01-01", freq="H", periods=11)
2020-01-20T17:16:38.0009595Z         expected = transform(
2020-01-20T17:16:38.0009786Z             da_time.copy(data=[np.nan, 1, 2, 3, 4, 5, np.nan, np.nan, np.nan, np.nan, 10])
2020-01-20T17:16:38.0009943Z         )
2020-01-20T17:16:38.0010092Z         actual = transform(da_time).interpolate_na("t", max_gap=max_gap)
2020-01-20T17:16:38.0011207Z >       assert_equal(actual, expected)
2020-01-20T17:16:38.0011341Z E       AssertionError: Left and right Dataset objects are not equal
2020-01-20T17:16:38.0011460Z E       
2020-01-20T17:16:38.0011606Z E       
2020-01-20T17:16:38.0011727Z E       Differing data variables:
2020-01-20T17:16:38.0011867Z E       L   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0012029Z E       R   a        (t) float64 nan 1.0 2.0 3.0 4.0 5.0 nan nan nan nan 10.0
2020-01-20T17:16:38.0012135Z 
2020-01-20T17:16:38.0012255Z xarray\tests\test_missing.py:568: AssertionError

Co-Authored-By: Spencer Clark <[email protected]>

dcherian · 2020-01-24T17:48:16Z

Is the switch to assert_allclose all that is left here?

huard · 2020-01-24T18:23:46Z

I think so. If there's a branch fixing the assert_allclose failures, I can merge it here.

dcherian · 2020-01-24T18:26:09Z

I think it's just this function: def test_interpolate_na_max_gap_time_specifier. If you switch assert_equal to assert_allclose things should work

huard · 2020-01-24T19:20:11Z

It seems like assert_allclose doesn't like nans. Other suggestion ?

spencerkclark · 2020-01-24T19:41:15Z

It seems like assert_allclose doesn't like nans. Other suggestion ?

Hmm...I think it should treat NaNs as equal by default. Are you using the version defined here?

xarray/xarray/tests/__init__.py

Line 170 in ecd67f4

def assert_allclose(a, b, **kwargs):

huard · 2020-01-24T20:27:59Z

Ah ! No, I tried numpy.testing's version. Works now. Thanks !

dcherian · 2020-01-24T21:49:00Z

Thanks @huard and @spencerkclark. This took a solid amount of work.

@spencerkclark merge if you think it's good to go?

spencerkclark · 2020-01-24T22:13:11Z

Indeed thanks @huard -- I'll give this another once-over tomorrow. If everything checks out, which I anticipate it will, I'll merge it then.

spencerkclark

@huard in looking things over again, I noticed a few things related to the implementation of timedelta_to_numeric that I think need attention. I think it would be helpful to have some explicit tests for it too.

Sorry for not catching these earlier.

xarray/core/duck_array_ops.py

Co-Authored-By: Spencer Clark <[email protected]>

spencerkclark

Great, thanks @huard, in particular for the new test. It would be great if you could take care of some last few nits; then I think things should be ready to merge.

I think it's clearer if both functions datetime_to_numeric and timedelta_to_numeric are side by side.

I see what you mean, particularly given the similarity in the handling of the datetime_unit argument. I think it's fine to leave them side by side for now. We can always move them later, given they are private functions.

xarray/core/duck_array_ops.py

…named array to value for pd_timedelta_to_float. removed pd_timedeltaindex_to_float.

spencerkclark · 2020-01-26T14:10:54Z

Thanks @huard!

huard · 2020-01-26T18:36:23Z

Thanks @spencerkclark for shepherding this to completion.

* master: Add support for CFTimeIndex in get_clean_interp_index (pydata#3631) sel with categorical index (pydata#3670) bump min deps for 0.15 (pydata#3713) setuptools-scm and isort tweaks (pydata#3720) Allow binned coordinates on 1D plots y-axis. (pydata#3685) apply_ufunc: Add meta kwarg + bump dask to 2.2 (pydata#3660) setuptools-scm and one-liner setup.py (pydata#3714) Feature/align in dot (pydata#3699) ENH: enable `H5NetCDFStore` to work with already open h5netcdf.File a… (pydata#3618) One-off isort run (pydata#3705) hardcoded xarray.__all__ (pydata#3703) Bump mypy to v0.761 (pydata#3704) remove DataArray and Dataset constructor deprecations for 0.15 (pydata#3560) Tests for variables with units (pydata#3654) Add an example notebook using apply_ufunc to vectorize 1D functions (pydata#3629) Use encoding['dtype'] over data.dtype when possible within CFMaskCoder.encode (pydata#3652)

huard added 6 commits December 16, 2019 12:29

add support for CFTimeIndex in get_clean_interp_index

d5c2242

black

77bb24c

added test comparing cftime index with standard index

03e7769

added comment

e169cf4

index in ns instead of days

303020f

pep8

210fb94

dcherian reviewed Dec 17, 2019

View reviewed changes

xarray/core/missing.py Show resolved Hide resolved

spencerkclark mentioned this pull request Dec 18, 2019

interp with long cftime coordinates raises an error #3641

Closed

huard added 5 commits December 18, 2019 12:28

datetime_to_numeric: convert timedelta objects using np.timedelta64 t…

1cfe72d

…ype conversion. add overflow tests

added interp test

4964163

switched clean_interp_index resolution to us. Fixed interpolate_na an…

83f6c89

…d added support for CFTimeIndex.

Error message to explain overflow problem.

6298953

Merge branch 'fix-3641' into cf_interp_index

3d23ccf

huard mentioned this pull request Dec 19, 2019

Make datetime_to_numeric more robust to overflow errors #3642

Closed

4 tasks

huard added 3 commits December 20, 2019 08:40

switched timedelta64 units from ms to us

2ba1803

Merge branch 'fix-3641' into cf_interp_index

9a648d9

reverted default user-visible resolution to ns. Converts to float, po…

e873da2

…ssibly lossy.

huard added 5 commits January 6, 2020 13:01

pep8

532756d

black

73d8729

special case for older numpy versions

4288780

black

077145e

added xfail for overflow error with numpy < 1.17

758d81c

spencerkclark reviewed Jan 7, 2020

View reviewed changes

xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved

spencerkclark reviewed Jan 11, 2020

View reviewed changes

huard added 4 commits January 14, 2020 09:32

changes following PR comments from spencerclark

d0d8bfe

bypass pandas to convert timedeltas to floats. avoids overflow errors.

6c9630a

black

d18c775

Merge branch 'master' into cf_interp_index

78e17ec

spencerkclark reviewed Jan 18, 2020

View reviewed changes

spencerkclark reviewed Jan 20, 2020

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved

spencerkclark reviewed Jan 20, 2020

View reviewed changes

huard and others added 4 commits January 20, 2020 13:30

Update doc/whats-new.rst

2974af9

Co-Authored-By: Spencer Clark <[email protected]>

update interpolate_na docstrings

eeb5074

black

6b9631f

dt conflicts with accessor

5656fdb

dcherian mentioned this pull request Jan 20, 2020

release 0.15.0? #3702

Closed

11 tasks

replaced assert_equal by assert_allclose

dcf98ff

spencerkclark reviewed Jan 25, 2020

View reviewed changes

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

huard and others added 4 commits January 25, 2020 16:50

Update xarray/core/duck_array_ops.py

4842a96

Co-Authored-By: Spencer Clark <[email protected]>

Update xarray/core/duck_array_ops.py

6dbf225

Co-Authored-By: Spencer Clark <[email protected]>

renamed array to value in timedelta_to_numeric. Added tests

c90dc97

removed support for TimedeltaIndex in timedelta_to_numeric

71fb87d

spencerkclark approved these changes Jan 25, 2020

View reviewed changes

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

xarray/core/duck_array_ops.py Show resolved Hide resolved

huard and others added 4 commits January 26, 2020 03:53

added tests for np_timedelta64_to_float and pd_timedelta_to_float. re…

3d9f333

…named array to value for pd_timedelta_to_float. removed pd_timedeltaindex_to_float.

black

b04785c

Fix flake8 error

d24cae4

black

6f0c504

spencerkclark merged commit 8772355 into pydata:master Jan 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CFTimeIndex in get_clean_interp_index #3631

Add support for CFTimeIndex in get_clean_interp_index #3631

huard commented Dec 16, 2019 •

edited

Loading

pep8speaks commented Jan 6, 2020 •

edited

Loading

spencerkclark left a comment

spencerkclark Jan 9, 2020

huard Jan 14, 2020

spencerkclark Jan 18, 2020

spencerkclark left a comment

spencerkclark Jan 18, 2020

spencerkclark left a comment

dcherian commented Jan 24, 2020

huard commented Jan 24, 2020

dcherian commented Jan 24, 2020

huard commented Jan 24, 2020

spencerkclark commented Jan 24, 2020

huard commented Jan 24, 2020

dcherian commented Jan 24, 2020

spencerkclark commented Jan 24, 2020

spencerkclark left a comment

spencerkclark left a comment

spencerkclark commented Jan 26, 2020

huard commented Jan 26, 2020

Add support for CFTimeIndex in get_clean_interp_index #3631

Add support for CFTimeIndex in get_clean_interp_index #3631

Conversation

huard commented Dec 16, 2019 • edited Loading

pep8speaks commented Jan 6, 2020 • edited Loading

Comment last updated at 2020-01-26 13:27:14 UTC

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark Jan 9, 2020

Choose a reason for hiding this comment

huard Jan 14, 2020

Choose a reason for hiding this comment

spencerkclark Jan 18, 2020

Choose a reason for hiding this comment

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark Jan 18, 2020

Choose a reason for hiding this comment

spencerkclark left a comment

Choose a reason for hiding this comment

dcherian commented Jan 24, 2020

huard commented Jan 24, 2020

dcherian commented Jan 24, 2020

huard commented Jan 24, 2020

spencerkclark commented Jan 24, 2020

huard commented Jan 24, 2020

dcherian commented Jan 24, 2020

spencerkclark commented Jan 24, 2020

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark commented Jan 26, 2020

huard commented Jan 26, 2020

huard commented Dec 16, 2019 •

edited

Loading

pep8speaks commented Jan 6, 2020 •

edited

Loading