Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): extension array indexers #9671

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
7b5f323
implement default_precision_timestamp, refactor coding/times.py and c…
kmuehlbauer Oct 10, 2024
8784f33
align tests with new time resolution behaviour
kmuehlbauer Oct 10, 2024
b45ab23
timedelta decoding, fsspec handling
kmuehlbauer Oct 10, 2024
39086ef
fixes in coding/times.py
kmuehlbauer Oct 13, 2024
df49a40
add docs on time coding
kmuehlbauer Oct 13, 2024
adb8ca3
attempt fixing doc tests
kmuehlbauer Oct 13, 2024
266b1ed
fix issue where out-of-bounds floating point values slipped in the pr…
kmuehlbauer Oct 14, 2024
6d5f13b
convert to UTC first before stripping of tz in _unpack_time_units_and…
kmuehlbauer Oct 14, 2024
5d68bfe
reorganize pandas compatibility code, remove unneeded code, attempt t…
kmuehlbauer Oct 14, 2024
07bba69
another attempt to finally fix mypy
kmuehlbauer Oct 14, 2024
6e7f0bb
refactor out _check_date_is_after_shift
kmuehlbauer Oct 14, 2024
b4a49bb
refactor out _maybe_strip_tz_from_timestamp
kmuehlbauer Oct 14, 2024
2e1ff4f
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
d5a7da0
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
821b68d
minor fix in time-coding.rst
kmuehlbauer Oct 14, 2024
d066edf
set default resolution to "s", which actually means, use pandas lowes…
kmuehlbauer Oct 14, 2024
ed22da1
Add section for default units, fix options
kmuehlbauer Oct 14, 2024
8bf23f4
attempt to fix typing
kmuehlbauer Oct 14, 2024
c3a2b39
attempt to fix typing
kmuehlbauer Oct 14, 2024
3c44aed
fix scalar datetime/timedelta
kmuehlbauer Oct 15, 2024
48be73a
fix user docs
kmuehlbauer Oct 15, 2024
7ac9983
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
d86ad04
Fix variable tests, mostly datetime/timedelta is inittialized with us…
kmuehlbauer Oct 18, 2024
b5d0795
revert changes in _possible_convert_objects, this needs to be checked…
kmuehlbauer Oct 18, 2024
60324f0
fix doc link
kmuehlbauer Oct 18, 2024
c2bc4df
(fix): allow all extension array data types in pandas adapters
ilan-gold Oct 23, 2024
84569bc
(fix): dataframes have no `array` attr
ilan-gold Oct 23, 2024
90e390d
(fix): allow chunked numpy extension arrays because of `test_pandas_a…
ilan-gold Oct 24, 2024
7c32bd0
(fix): dtypes for `PandasIndex`
ilan-gold Oct 24, 2024
795ecf6
(chore): remove test for unnecessary conversion
ilan-gold Oct 24, 2024
8eca6e9
(revert): don't let through so much in `as_compatible_data`
ilan-gold Oct 24, 2024
fb91812
(fix): account for series -> numpy conversions
ilan-gold Oct 25, 2024
a06f2b1
(fix): ensure dtype check is for numpy type
ilan-gold Oct 25, 2024
14027e8
(fix): convert pandas `IntervalArray`
ilan-gold Oct 25, 2024
a47a96f
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Oct 25, 2024
6f2861a
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
1f07500
Apply suggestions from code review
kmuehlbauer Nov 8, 2024
798b444
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
f487599
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 16, 2024
20d6c9d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2024
7391948
remove outdated description
kmuehlbauer Nov 16, 2024
308091c
use set instead list
kmuehlbauer Nov 16, 2024
5f40b4e
remove global option
kmuehlbauer Nov 16, 2024
2a65d8d
mypy thinks `unit` is Literal, because the pandas-stubs suggest so, b…
kmuehlbauer Nov 17, 2024
43f7d61
ignore mypy arg-type
kmuehlbauer Nov 17, 2024
59934b9
fix docstring of `default_precision_timestamp`
kmuehlbauer Nov 17, 2024
a01f9f3
add 'time_unit'-kwarg to decode_cf and descendent functions with "ns"…
kmuehlbauer Nov 17, 2024
8b91128
fix tests
kmuehlbauer Nov 17, 2024
0e351ca
fix more tests
kmuehlbauer Nov 17, 2024
07a8e9c
fix docstring
kmuehlbauer Nov 17, 2024
2be5739
use pd.Timestamp(np.datetime64(cftime)) to convert from cftime to numpy
kmuehlbauer Nov 17, 2024
b9d0a8e
use dt = np.datetime64(cftime.isoformat()) to convert from cftime to …
kmuehlbauer Nov 18, 2024
08afc3b
fix time-coding.rst
kmuehlbauer Nov 18, 2024
edc55e1
use us in to_datetimeindex
kmuehlbauer Nov 18, 2024
bffe919
revert back to us for datetimeindex tests
kmuehlbauer Nov 18, 2024
150b982
estimate fitting resolution for floating point values, when decoding …
kmuehlbauer Nov 18, 2024
7113ceb
add test
kmuehlbauer Nov 18, 2024
7f47f0b
refactor floating point decoding
kmuehlbauer Nov 18, 2024
512808d
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 18, 2024
63c83f4
simplify recursive function, update tests
kmuehlbauer Nov 18, 2024
0efbbeb
more refactoring, update tests
kmuehlbauer Nov 19, 2024
2910250
add fixture, apply fixture to more tests.
kmuehlbauer Nov 19, 2024
57d8d72
update time-coding.rst
kmuehlbauer Nov 19, 2024
5333240
fix typing
kmuehlbauer Nov 19, 2024
6f35c81
try to fix test, remove stale print
kmuehlbauer Nov 19, 2024
d0c17a4
another attempt to fix test
kmuehlbauer Nov 19, 2024
b2b6bb1
debug failing test
kmuehlbauer Nov 19, 2024
5dbc8a7
refactor cftime fallback in datetime decoding
kmuehlbauer Nov 21, 2024
be0d3e0
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
f95408a
fix merge-collission
kmuehlbauer Nov 21, 2024
609e15c
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
ec7f165
use CFDatetimeCoder instance to transport unit/use_cftime
kmuehlbauer Nov 22, 2024
1f1cf1c
decode_times with CFDatetimeCoder
kmuehlbauer Nov 25, 2024
14b1a88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2024
05627dd
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 25, 2024
e7cbf3a
fix mypy, warning/error
kmuehlbauer Nov 26, 2024
fc87e04
api, docs, docstrings
kmuehlbauer Nov 26, 2024
9ae645e
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 26, 2024
6e3ca57
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
277d1c6
docs, whats-new.rst
kmuehlbauer Nov 27, 2024
81a9d94
fix whats-new.rst
kmuehlbauer Nov 27, 2024
be8642f
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
f3f62e5
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 2, 2024
c07df41
Merge remote-tracking branch 'origin/main' into any-time-resolution-2
kmuehlbauer Dec 10, 2024
ae49850
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
e8f5aa8
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
9653a01
fix tests after merge
kmuehlbauer Dec 10, 2024
a405f03
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
503b313
(fix): `dtype` type handling
ilan-gold Dec 11, 2024
c8ab8f3
(fix): move out of type checking block
ilan-gold Dec 11, 2024
66e5b06
(fix): satisfy mypy
ilan-gold Dec 11, 2024
f9fde3a
(fix): doctest
ilan-gold Dec 11, 2024
8a3e834
(fix): `nbytes` test?
ilan-gold Dec 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
refactor cftime fallback in datetime decoding
  • Loading branch information
kmuehlbauer committed Nov 21, 2024
commit 5dbc8a7ff46815e7ff4c9db48d9754f2a6dbae22
31 changes: 28 additions & 3 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
@@ -463,6 +463,11 @@ def decode_cf_datetime(
cftype = type(dates[np.nanargmin(num_dates)])
# create first day of gregorian calendar in current cf calendar type
border = cftype(1582, 10, 15)
# "ns" boarders
# between ['1677-09-21T00:12:43.145224193', '2262-04-11T23:47:16.854775807']
lower = cftype(1677, 9, 21, 0, 12, 43, 145224)
upper = cftype(2262, 4, 11, 23, 47, 16, 854775)

# todo: check if test for minimum date is enough
if (
dates[np.nanargmin(num_dates)] < border
@@ -477,9 +482,27 @@ def decode_cf_datetime(
SerializationWarning,
stacklevel=3,
)
elif time_unit == "ns" and (
(
dates[np.nanargmin(num_dates)] < lower
or dates[np.nanargmin(num_dates)] > upper
)
or (
dates[np.nanargmax(num_dates)] < lower
or dates[np.nanargmax(num_dates)] > upper
)
):
warnings.warn(
"Unable to decode time axis into full "
"numpy.datetime64 objects, continuing using "
"cftime.datetime objects instead, reason: dates out "
"of range",
SerializationWarning,
stacklevel=3,
)
else:
if _is_standard_calendar(calendar):
dates = cftime_to_nptime(dates)
dates = cftime_to_nptime(dates, time_unit=time_unit)
elif use_cftime:
dates = _decode_datetime_with_cftime(flat_num_dates, units, calendar)
else:
@@ -605,7 +628,9 @@ def infer_timedelta_units(deltas) -> str:
return _infer_time_units_from_diff(unique_timedeltas)


def cftime_to_nptime(times, raise_on_invalid: bool = True) -> np.ndarray:
def cftime_to_nptime(
times, raise_on_invalid: bool = True, time_unit: PDDatetimeUnitOptions = "ns"
) -> np.ndarray:
"""Given an array of cftime.datetime objects, return an array of
numpy.datetime64 objects of the same size

@@ -618,7 +643,7 @@ def cftime_to_nptime(times, raise_on_invalid: bool = True) -> np.ndarray:
try:
# We expect either "us" resolution or "s" resolution depending on
# whether 'microseconds' are defined for the input or not.
dt = np.datetime64(t.isoformat())
dt = np.datetime64(t.isoformat()).astype(f"=M8[{time_unit}]")
except ValueError as e:
if raise_on_invalid:
raise ValueError(
82 changes: 41 additions & 41 deletions xarray/tests/test_coding_times.py
Original file line number Diff line number Diff line change
@@ -133,17 +133,13 @@ def test_cf_datetime(
num_dates, units, calendar, only_use_cftime_datetimes=True
)

min_y = np.ravel(np.atleast_1d(expected))[np.nanargmin(num_dates)] # .year
max_y = np.ravel(np.atleast_1d(expected))[np.nanargmax(num_dates)] # .year
typ = type(min_y)
border = typ(1582, 10, 15)
if (calendar == "proleptic_gregorian" and time_unit != "ns") or (
calendar in _STANDARD_CALENDARS and (min_y >= border and max_y >= border)
):
expected = cftime_to_nptime(expected)
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "Unable to decode time axis")
actual = decode_cf_datetime(num_dates, units, calendar, time_unit=time_unit)

if actual.dtype.kind != "O":
expected = cftime_to_nptime(expected)

abs_diff = np.asarray(abs(actual - expected)).ravel()
abs_diff = pd.to_timedelta(abs_diff.tolist()).to_numpy()

@@ -164,7 +160,7 @@ def test_cf_datetime(


@requires_cftime
def test_decode_cf_datetime_overflow() -> None:
def test_decode_cf_datetime_overflow(time_unit: PDDatetimeUnitOptions) -> None:
# checks for
# https://github.com/pydata/pandas/issues/14068
# https://github.com/pydata/xarray/issues/975
@@ -174,13 +170,13 @@ def test_decode_cf_datetime_overflow() -> None:
units = "days since 2000-01-01 00:00:00"

# date after 2262 and before 1678
days = (-117608, 95795)
expected = (datetime(1677, 12, 31), datetime(2262, 4, 12))
days = (-117710, 95795)
expected = (datetime(1677, 9, 20), datetime(2262, 4, 12))

for i, day in enumerate(days):
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "Unable to decode time axis")
result = decode_cf_datetime(day, units)
result = decode_cf_datetime(day, units, time_unit=time_unit)
assert result == expected[i]


@@ -214,25 +210,22 @@ def test_decode_cf_datetime_non_iso_strings() -> None:

@requires_cftime
@pytest.mark.parametrize("calendar", _STANDARD_CALENDARS)
def test_decode_standard_calendar_inside_timestamp_range(calendar, time_unit) -> None:
def test_decode_standard_calendar_inside_timestamp_range(
calendar, time_unit: PDDatetimeUnitOptions
) -> None:
import cftime

units = "days since 0001-01-01"
times = pd.date_range("2001-04-01-00", end="2001-04-30-23", unit="us", freq="h")
times = pd.date_range(
"2001-04-01-00", end="2001-04-30-23", unit=time_unit, freq="h"
)
# to_pydatetime() will return microsecond
time = cftime.date2num(times.to_pydatetime(), units, calendar=calendar)
expected = times.values
print(expected)
# for cftime we get "us" resolution
# ns resolution is handled by cftime, too (OutOfBounds)
actual = decode_cf_datetime(time, units, calendar=calendar, time_unit=time_unit)
print(actual, actual.dtype)
if calendar != "proleptic_gregorian" or time_unit == "ns":
unit = "us"
else:
unit = time_unit
expected_dtype = np.dtype(f"=M8[{unit}]")
assert actual.dtype == expected_dtype
assert actual.dtype == np.dtype(f"=M8[{time_unit}]")
abs_diff = abs(actual - expected)
# once we no longer support versions of netCDF4 older than 1.1.5,
# we could do this check with near microsecond accuracy:
@@ -296,21 +289,20 @@ def test_decode_dates_outside_timestamp_range(

@requires_cftime
@pytest.mark.parametrize("calendar", _STANDARD_CALENDARS)
@pytest.mark.parametrize("num_time", [735368, [735368], [[735368]]])
def test_decode_standard_calendar_single_element_inside_timestamp_range(
calendar, time_unit: PDDatetimeUnitOptions
calendar,
time_unit: PDDatetimeUnitOptions,
num_time,
) -> None:
units = "days since 0001-01-01"
unit = "s"
if calendar == "proleptic_gregorian" and time_unit != "ns":
unit = time_unit
for num_time in [735368, [735368], [[735368]]]:
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "Unable to decode time axis")
actual = decode_cf_datetime(
num_time, units, calendar=calendar, time_unit=time_unit
)
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "Unable to decode time axis")
actual = decode_cf_datetime(
num_time, units, calendar=calendar, time_unit=time_unit
)

assert actual.dtype == np.dtype(f"=M8[{unit}]")
assert actual.dtype == np.dtype(f"=M8[{time_unit}]")


@requires_cftime
@@ -353,9 +345,6 @@ def test_decode_standard_calendar_multidim_time_inside_timestamp_range(
import cftime

units = "days since 0001-01-01"
unit = "s"
if calendar == "proleptic_gregorian" and time_unit != "ns":
unit = time_unit
times1 = pd.date_range("2001-04-01", end="2001-04-05", freq="D")
times2 = pd.date_range("2001-05-01", end="2001-05-05", freq="D")
time1 = cftime.date2num(times1.to_pydatetime(), units, calendar=calendar)
@@ -370,7 +359,7 @@ def test_decode_standard_calendar_multidim_time_inside_timestamp_range(
actual = decode_cf_datetime(
mdim_time, units, calendar=calendar, time_unit=time_unit
)
assert actual.dtype == np.dtype(f"=M8[{unit}]")
assert actual.dtype == np.dtype(f"=M8[{time_unit}]")

abs_diff1 = abs(actual[:, 0] - expected1)
abs_diff2 = abs(actual[:, 1] - expected2)
@@ -984,7 +973,9 @@ def test_use_cftime_default_standard_calendar_out_of_range(
@requires_cftime
@pytest.mark.parametrize("calendar", _NON_STANDARD_CALENDARS)
@pytest.mark.parametrize("units_year", [1500, 2000, 2500])
def test_use_cftime_default_non_standard_calendar(calendar, units_year) -> None:
def test_use_cftime_default_non_standard_calendar(
calendar, units_year, time_unit
) -> None:
from cftime import num2date

numerical_dates = [0, 1]
@@ -993,9 +984,18 @@ def test_use_cftime_default_non_standard_calendar(calendar, units_year) -> None:
numerical_dates, units, calendar, only_use_cftime_datetimes=True
)

with assert_no_warnings():
result = decode_cf_datetime(numerical_dates, units, calendar)
np.testing.assert_array_equal(result, expected)
if time_unit == "ns" and units_year == 2500:
with pytest.warns(SerializationWarning, match="Unable to decode time axis"):
result = decode_cf_datetime(
numerical_dates, units, calendar, time_unit=time_unit
)
else:
with assert_no_warnings():
result = decode_cf_datetime(
numerical_dates, units, calendar, time_unit=time_unit
)

np.testing.assert_array_equal(result, expected)


@requires_cftime