Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): extension array indexers #9671

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
7b5f323
implement default_precision_timestamp, refactor coding/times.py and c…
kmuehlbauer Oct 10, 2024
8784f33
align tests with new time resolution behaviour
kmuehlbauer Oct 10, 2024
b45ab23
timedelta decoding, fsspec handling
kmuehlbauer Oct 10, 2024
39086ef
fixes in coding/times.py
kmuehlbauer Oct 13, 2024
df49a40
add docs on time coding
kmuehlbauer Oct 13, 2024
adb8ca3
attempt fixing doc tests
kmuehlbauer Oct 13, 2024
266b1ed
fix issue where out-of-bounds floating point values slipped in the pr…
kmuehlbauer Oct 14, 2024
6d5f13b
convert to UTC first before stripping of tz in _unpack_time_units_and…
kmuehlbauer Oct 14, 2024
5d68bfe
reorganize pandas compatibility code, remove unneeded code, attempt t…
kmuehlbauer Oct 14, 2024
07bba69
another attempt to finally fix mypy
kmuehlbauer Oct 14, 2024
6e7f0bb
refactor out _check_date_is_after_shift
kmuehlbauer Oct 14, 2024
b4a49bb
refactor out _maybe_strip_tz_from_timestamp
kmuehlbauer Oct 14, 2024
2e1ff4f
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
d5a7da0
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
821b68d
minor fix in time-coding.rst
kmuehlbauer Oct 14, 2024
d066edf
set default resolution to "s", which actually means, use pandas lowes…
kmuehlbauer Oct 14, 2024
ed22da1
Add section for default units, fix options
kmuehlbauer Oct 14, 2024
8bf23f4
attempt to fix typing
kmuehlbauer Oct 14, 2024
c3a2b39
attempt to fix typing
kmuehlbauer Oct 14, 2024
3c44aed
fix scalar datetime/timedelta
kmuehlbauer Oct 15, 2024
48be73a
fix user docs
kmuehlbauer Oct 15, 2024
7ac9983
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
d86ad04
Fix variable tests, mostly datetime/timedelta is inittialized with us…
kmuehlbauer Oct 18, 2024
b5d0795
revert changes in _possible_convert_objects, this needs to be checked…
kmuehlbauer Oct 18, 2024
60324f0
fix doc link
kmuehlbauer Oct 18, 2024
c2bc4df
(fix): allow all extension array data types in pandas adapters
ilan-gold Oct 23, 2024
84569bc
(fix): dataframes have no `array` attr
ilan-gold Oct 23, 2024
90e390d
(fix): allow chunked numpy extension arrays because of `test_pandas_a…
ilan-gold Oct 24, 2024
7c32bd0
(fix): dtypes for `PandasIndex`
ilan-gold Oct 24, 2024
795ecf6
(chore): remove test for unnecessary conversion
ilan-gold Oct 24, 2024
8eca6e9
(revert): don't let through so much in `as_compatible_data`
ilan-gold Oct 24, 2024
fb91812
(fix): account for series -> numpy conversions
ilan-gold Oct 25, 2024
a06f2b1
(fix): ensure dtype check is for numpy type
ilan-gold Oct 25, 2024
14027e8
(fix): convert pandas `IntervalArray`
ilan-gold Oct 25, 2024
a47a96f
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Oct 25, 2024
6f2861a
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
1f07500
Apply suggestions from code review
kmuehlbauer Nov 8, 2024
798b444
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
f487599
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 16, 2024
20d6c9d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2024
7391948
remove outdated description
kmuehlbauer Nov 16, 2024
308091c
use set instead list
kmuehlbauer Nov 16, 2024
5f40b4e
remove global option
kmuehlbauer Nov 16, 2024
2a65d8d
mypy thinks `unit` is Literal, because the pandas-stubs suggest so, b…
kmuehlbauer Nov 17, 2024
43f7d61
ignore mypy arg-type
kmuehlbauer Nov 17, 2024
59934b9
fix docstring of `default_precision_timestamp`
kmuehlbauer Nov 17, 2024
a01f9f3
add 'time_unit'-kwarg to decode_cf and descendent functions with "ns"…
kmuehlbauer Nov 17, 2024
8b91128
fix tests
kmuehlbauer Nov 17, 2024
0e351ca
fix more tests
kmuehlbauer Nov 17, 2024
07a8e9c
fix docstring
kmuehlbauer Nov 17, 2024
2be5739
use pd.Timestamp(np.datetime64(cftime)) to convert from cftime to numpy
kmuehlbauer Nov 17, 2024
b9d0a8e
use dt = np.datetime64(cftime.isoformat()) to convert from cftime to …
kmuehlbauer Nov 18, 2024
08afc3b
fix time-coding.rst
kmuehlbauer Nov 18, 2024
edc55e1
use us in to_datetimeindex
kmuehlbauer Nov 18, 2024
bffe919
revert back to us for datetimeindex tests
kmuehlbauer Nov 18, 2024
150b982
estimate fitting resolution for floating point values, when decoding …
kmuehlbauer Nov 18, 2024
7113ceb
add test
kmuehlbauer Nov 18, 2024
7f47f0b
refactor floating point decoding
kmuehlbauer Nov 18, 2024
512808d
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 18, 2024
63c83f4
simplify recursive function, update tests
kmuehlbauer Nov 18, 2024
0efbbeb
more refactoring, update tests
kmuehlbauer Nov 19, 2024
2910250
add fixture, apply fixture to more tests.
kmuehlbauer Nov 19, 2024
57d8d72
update time-coding.rst
kmuehlbauer Nov 19, 2024
5333240
fix typing
kmuehlbauer Nov 19, 2024
6f35c81
try to fix test, remove stale print
kmuehlbauer Nov 19, 2024
d0c17a4
another attempt to fix test
kmuehlbauer Nov 19, 2024
b2b6bb1
debug failing test
kmuehlbauer Nov 19, 2024
5dbc8a7
refactor cftime fallback in datetime decoding
kmuehlbauer Nov 21, 2024
be0d3e0
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
f95408a
fix merge-collission
kmuehlbauer Nov 21, 2024
609e15c
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
ec7f165
use CFDatetimeCoder instance to transport unit/use_cftime
kmuehlbauer Nov 22, 2024
1f1cf1c
decode_times with CFDatetimeCoder
kmuehlbauer Nov 25, 2024
14b1a88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2024
05627dd
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 25, 2024
e7cbf3a
fix mypy, warning/error
kmuehlbauer Nov 26, 2024
fc87e04
api, docs, docstrings
kmuehlbauer Nov 26, 2024
9ae645e
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 26, 2024
6e3ca57
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
277d1c6
docs, whats-new.rst
kmuehlbauer Nov 27, 2024
81a9d94
fix whats-new.rst
kmuehlbauer Nov 27, 2024
be8642f
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
f3f62e5
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 2, 2024
c07df41
Merge remote-tracking branch 'origin/main' into any-time-resolution-2
kmuehlbauer Dec 10, 2024
ae49850
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
e8f5aa8
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
9653a01
fix tests after merge
kmuehlbauer Dec 10, 2024
a405f03
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
503b313
(fix): `dtype` type handling
ilan-gold Dec 11, 2024
c8ab8f3
(fix): move out of type checking block
ilan-gold Dec 11, 2024
66e5b06
(fix): satisfy mypy
ilan-gold Dec 11, 2024
f9fde3a
(fix): doctest
ilan-gold Dec 11, 2024
8a3e834
(fix): `nbytes` test?
ilan-gold Dec 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
(fix): dtypes for PandasIndex
ilan-gold committed Oct 24, 2024
commit 7c32bd073751546921110fe0e9fba1671e7d4b7f
4 changes: 3 additions & 1 deletion xarray/core/indexes.py
Original file line number Diff line number Diff line change
@@ -601,7 +601,7 @@ def __init__(
if pd.api.types.is_extension_array_dtype(index.dtype):
cast(pd.api.extensions.ExtensionDtype, index.dtype)
coord_dtype = index.dtype
else:
elif coord_dtype is None:
coord_dtype = get_valid_numpy_dtype(index)
self.coord_dtype = coord_dtype

@@ -698,6 +698,8 @@ def concat(

if not indexes:
coord_dtype = None
elif len(set(idx.coord_dtype for idx in indexes)) == 1:
coord_dtype = indexes[0].coord_dtype
else:
coord_dtype = np.result_type(*[idx.coord_dtype for idx in indexes])

57 changes: 32 additions & 25 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
@@ -662,7 +662,7 @@ def test_pandas_categorical_dtype(self):
data = pd.Categorical(np.arange(10, dtype="int64"))
v = self.cls("x", data)
print(v) # should not error
assert v.dtype == "int64"
assert v.dtype == data.dtype

def test_pandas_datetime64_with_tz(self):
data = pd.date_range(
@@ -673,9 +673,12 @@ def test_pandas_datetime64_with_tz(self):
)
v = self.cls("x", data)
print(v) # should not error
if "America/New_York" in str(data.dtype):
# pandas is new enough that it has datetime64 with timezone dtype
assert v.dtype == "object"
if v.dtype == np.dtype("O"):
import dask.array as da

assert isinstance(v.data, da.Array)
else:
assert v.dtype == data.dtype

def test_multiindex(self):
idx = pd.MultiIndex.from_product([list("abc"), [0, 1]])
@@ -2404,8 +2407,8 @@ def test_pad(self, mode, xr_arg, np_arg):

def test_pandas_categorical_dtype(self):
data = pd.Categorical(np.arange(10, dtype="int64"))
with pytest.raises(ValueError, match="was found to be a Pandas ExtensionArray"):
self.cls("x", data)
v = self.cls("x", data)
assert (v.data.compute() == data.to_numpy()).all()


@requires_sparse
@@ -2683,8 +2686,9 @@ def test_tz_datetime(self) -> None:
warnings.simplefilter("ignore")
actual2: T_DuckArray = as_compatible_data(series)

np.testing.assert_array_equal(actual2, np.asarray(series.values))
assert actual2.dtype == np.dtype("datetime64[s]")
assert (actual2 == series).all()
pd.testing.assert_extension_array_equal(actual2.array, series.array)
assert actual2.dtype == series.dtype

def test_full_like(self) -> None:
# For more thorough tests, see test_variable.py
@@ -2960,32 +2964,35 @@ def test_from_pint_wrapping_dask(self, Var):


@pytest.mark.parametrize(
("values", "unit"),
"values",
[
(np.datetime64("2000-01-01", "ns"), "ns"),
(np.datetime64("2000-01-01", "s"), "s"),
(np.array([np.datetime64("2000-01-01", "ns")]), "ns"),
(np.array([np.datetime64("2000-01-01", "s")]), "s"),
(pd.date_range("2000", periods=1), "ns"),
(datetime(2000, 1, 1), "us"),
(np.array([datetime(2000, 1, 1)]), "ns"),
(pd.date_range("2000", periods=1, tz=pytz.timezone("America/New_York")), "ns"),
(
pd.Series(
pd.date_range("2000", periods=1, tz=pytz.timezone("America/New_York"))
),
"ns",
np.datetime64("2000-01-01", "ns"),
np.datetime64("2000-01-01", "s"),
np.array([np.datetime64("2000-01-01", "ns")]),
np.array([np.datetime64("2000-01-01", "s")]),
pd.date_range("2000", periods=1),
datetime(2000, 1, 1),
np.array([datetime(2000, 1, 1)]),
pd.date_range("2000", periods=1, tz=pytz.timezone("America/New_York")),
pd.Series(
pd.date_range("2000", periods=1, tz=pytz.timezone("America/New_York"))
),
],
ids=lambda x: f"{x}",
)
def test_datetime_conversion_warning(values, unit) -> None:
def test_datetime_conversion_warning(values) -> None:
# todo: needs discussion
# todo: check, if this test is OK
dims = ["time"] if isinstance(values, np.ndarray | pd.Index | pd.Series) else []
var = Variable(dims, values)
if var.dtype.kind == "M":
assert var.dtype == np.dtype(f"datetime64[{unit}]")
if not hasattr(values, "dtype"):
assert var.dtype == np.dtype("datetime64[us]")
elif values.dtype == np.dtype("O"):
# We assign a nicer dtype here
assert var.dtype == np.dtype("datetime64[ns]")
else:
assert var.dtype == values.dtype
else:
# The only case where a non-datetime64 dtype can occur currently is in
# the case that the variable is backed by a timezone-aware
@@ -3027,7 +3034,7 @@ def test_pandas_two_only_datetime_conversion_warnings(
var = Variable(["time"], data.astype(dtype)) # type: ignore[arg-type]

if var.dtype.kind == "M":
assert var.dtype == np.dtype("datetime64[s]")
assert var.dtype == dtype
else:
# The only case where a non-datetime64 dtype can occur currently is in
# the case that the variable is backed by a timezone-aware