-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFTimeIndex #1252
CFTimeIndex #1252
Conversation
xarray/tests/test_netcdftimeindex.py
Outdated
self.date_type(1, 2, 1)]), | ||
self.da.sel(time=[True, True, False, False]) | ||
]: | ||
self.assertDataArrayIdentical(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sorts of tests would be much more natural if you use pytest fixtures
xarray/tests/test_netcdftimeindex.py
Outdated
|
||
|
||
@requires_netCDF4 | ||
class NetCDFTimeIndexTests(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure pytest runs these tests? I think it requires the class name to start with Test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the tests do run, though I agree I was a bit careless with the names of the classes:
https://travis-ci.org/pydata/xarray/jobs/198709337#L1877
I'll address that as I refactor things to take more advantage of pytest
.
xarray/tests/test_netcdftimeindex.py
Outdated
class DatetimeNoLeap(NetCDFTimeIndexTests, TestCase): | ||
def set_date_type(self): | ||
from netcdftime import DatetimeNoLeap | ||
self.date_type = DatetimeNoLeap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would also be really easy as parameterized fixtures
Thanks for the quick feedback on the tests @MaximilianR. Is this on the right track for doing things a little more idiomatically with import pytest
from xarray.tests import assert_array_equal
def netcdftime_date_types():
pytest.importorskip('netCDF4')
from netcdftime import (
DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day)
return [DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day]
@pytest.fixture(params=[])
def index(request):
from xarray.core.netcdftimeindex import NetCDFTimeIndex
date_type = request.param
dates = [date_type(1, 1, 1), date_type(1, 2, 1),
date_type(2, 1, 1), date_type(2, 2, 1)]
return NetCDFTimeIndex(dates)
@pytest.mark.parametrize('index', netcdftime_date_types(), indirect=True)
@pytest.mark.parametrize(('field', 'expected'), [
('year', [1, 1, 2, 2]),
('month', [1, 2, 1, 2]),
('day', [1, 1, 1, 1]),
('hour', [0, 0, 0, 0]),
('minute', [0, 0, 0, 0]),
('second', [0, 0, 0, 0]),
('microsecond', [0, 0, 0, 0])
], ids=['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond'])
def test_netcdftimeindex_field_accessors(index, field, expected):
result = getattr(index, field)
assert_array_equal(result, expected) |
@MaximilianR I think I'm getting the hang of it; ignore the above. I'll push a new update to the PR in a bit. |
Not far off, although no need to use things like More than happy to offer any guidance on tests if helpful - post an example. pytest is really nice, even if it takes a bit of time to get used to |
6257f8c
to
6496458
Compare
@MaximilianR 6496458 contains an updated version of the file containing the tests, updated to use Overall pytest seems to clean things up pretty nicely. One problem that I wish I had a better solution for happens when I am testing indexing operations in DataArrays, Series, and DataFrames. There are multiple ways of getting the same answer, which makes these tests a good candidate for using For instance, it would be great if I could write something like: @pytest.mark.parametrize('sel_arg', [
'0001',
slice('0001-01-01', '0001-12-30'),
[True, True, False, False],
slice(date_type(1, 1, 1), date_type(1, 12, 30)),
[date_type(1, 1, 1), date_type(1, 2, 1)]
], ids=['string', 'string-slice', 'bool-list', 'date-slice', 'date-list'])
def test_sel(da, index, sel_arg):
expected = xr.DataArray([1, 2], coords=[index[:2]], dims=['time'])
result = da.sel(time=sel_arg)
assert_identical(result, expected) But I can't use In any event, when you get a chance, please let me know if you have any comments / suggestions on my latest push. Thanks again for your help. |
xarray/core/netcdftimeindex.py
Outdated
import numpy as np | ||
import pandas as pd | ||
|
||
from pandas.lib import isscalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V minor but there is an xarray version of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the pandas version isn't public API :)
xarray/core/netcdftimeindex.py
Outdated
|
||
|
||
def named(name, pattern): | ||
return '(?P<' + name + '>' + pattern + ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .format
is faster (as well as idiomatic) because this way will build n strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.
That looks awesome! Quite a turn around there. Yes, good point on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very nice start!
Two limitations of the current design that are worth noting:
- It doesn't do resample
- It doesn't handle missing values
I don't think either of these are deal breakers
xarray/core/netcdftimeindex.py
Outdated
@@ -0,0 +1,180 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't go in core
, since there's nothing tying it to core xarray internals. Instead, it should probably go in a new top level module, maybe a new directory alongside the contents of the existing conventions module (rename it to xarray.conventions.coding
?).
xarray/core/netcdftimeindex.py
Outdated
|
||
|
||
def named(name, pattern): | ||
return '(?P<' + name + '>' + pattern + ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.
xarray/core/netcdftimeindex.py
Outdated
def parse_iso8601(datetime_string): | ||
basic_pattern = build_pattern(date_sep='', time_sep='') | ||
extended_pattern = build_pattern() | ||
patterns = [basic_pattern, extended_pattern] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save this in as global variable.
xarray/core/netcdftimeindex.py
Outdated
for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']: | ||
value = result.get(attr, None) | ||
if value is not None: | ||
replace[attr] = int(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that seconds can be fractional
xarray/core/netcdftimeindex.py
Outdated
return default.replace(**replace), resolution | ||
|
||
|
||
def _parsed_string_to_bounds(date_type, resolution, parsed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this is based on a pandas function
xarray/tests/test_netcdftimeindex.py
Outdated
@pytest.fixture | ||
def feb_days(date_type): | ||
from netcdftime import DatetimeAllLeap, Datetime360Day | ||
if date_type == DatetimeAllLeap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use is
for type identity checks.
xarray/tests/test_netcdftimeindex.py
Outdated
|
||
expected = pd.Series([1, 2], index=index[:2]) | ||
for arg in range_args: | ||
pd.util.testing.assert_series_equal(series[arg], expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful. I don't think this is public API. Better to stick with the .equals
.
xarray/core/netcdftimeindex.py
Outdated
import numpy as np | ||
import pandas as pd | ||
|
||
from pandas.lib import isscalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the pandas version isn't public API :)
xarray/core/netcdftimeindex.py
Outdated
|
||
|
||
def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'): | ||
pieces = [(None, 'year', '\d{4}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need negative or five digit years?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't see myself needing it in the near future, but I'm not necessarily opposed to adding that support if others would find it useful.
It would make writing simple positive four-digit year dates more complicated though right? Would you always need the leading zero and the sign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's not bother until someone asks. Per Wikipedia's ISO 8601 you can optionally use an expanded year representation with +
and -
. I don't think they would always be necessary but I haven't read the original document (which unfortunately I think is not available only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI NCAR's TraCE simulation project is a 21k yr paleoclimate simulation. Not sure how they handle calendars/times. I know somebody who has analyzed data from this simulation; will ask what it looks like.
xarray/tests/test_netcdftimeindex.py
Outdated
'minute', 'minute-dash', 'second', 'second-dash', 'second-dec', | ||
'second-dec-dash']) | ||
def test_parse_iso8601(string, expected): | ||
from xarray.core.netcdftimeindex import parse_iso8601 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do your imports at the top level if at all possible.
Yes, this would be best addressed upstream in netcdftime. |
@shoyer thanks for your initial review comments. I'll try and push an update in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer I have just pushed an update. Whenever you get a chance please have another look.
I have a few comments in-line, but more broadly please let me know if I handled moving netcdftimeindex.py
out of core
and into a new conventions
directory the way you wanted. Thanks!
'The microseconds of the datetime') | ||
date_type = property(get_date_type) | ||
|
||
def _partial_date_slice(self, resolution, parsed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I tried to go as simple as possible here and in _get_string_slice
. I think trying to exactly mimic DatetimeIndex's behavior could get messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a few examples (either here or in the docstring) that describe what behavior is not covered in this implementation.
try: | ||
result = self.get_loc(key) | ||
return (is_scalar(result) or type(result) == slice or | ||
(isinstance(result, np.ndarray) and result.size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially all I want to do here is, if result
is a numpy array, check if it is not empty. Is there a cleaner way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about the best you can do
xarray/tests/test_netcdftimeindex.py
Outdated
return dict(year=year, month=month, day=day, hour=hour, | ||
minute=minute, second=second) | ||
|
||
ISO8601_STRING_TESTS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this along the lines of what you were looking for here? I couldn't find a name for the second argument to pytest.mark.parametrize
, so I wasn't sure how to combine these two list arguments into a dict (but maybe I misunderstood what you were asking for).
@@ -35,10 +36,12 @@ def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'): | |||
return '^' + trailing_optional(pattern_list) + '$' | |||
|
|||
|
|||
basic_pattern = build_pattern(date_sep='', time_sep='') | |||
extended_pattern = build_pattern() | |||
patterns = [basic_pattern, extended_pattern] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use all caps for global constants, and preface for an underscore to indicate that they are private variables, e.g., _BASIC_PATTERN
@@ -54,13 +57,22 @@ def _parse_iso8601_with_reso(date_type, timestr): | |||
for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']: | |||
value = result.get(attr, None) | |||
if value is not None: | |||
# Note ISO8601 conventions allow for fractional seconds; casting | |||
# to an int means all seconds values get rounded down to the | |||
# nearest integer. TODO: Consider adding support for sub-second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should update the regex above to exclude fractional seconds if that doesn't work
|
||
if not isinstance(data[0], datetime): | ||
raise TypeError( | ||
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the public API name netcdftime.datetime
.
Also, print the invalid object in the error message (using .format
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the public API name actually represents a DatetimeProlepticGregorian
type, so for now to stick with public API imports, I've resorted to importing all six of the netcdftime datetime types.
In [1]: from netcdftime import datetime
In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)
raise TypeError( | ||
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime' | ||
' objects.') | ||
if not all(isinstance(value, type(data[0])) for value in data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a variable for type(data[0])
outside the loop.
if not all(isinstance(value, type(data[0])) for value in data): | ||
raise TypeError( | ||
'NetCDFTimeIndex requires using netcdftime._netcdftime.datetime' | ||
' objects of all the same type.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concerns as above on the error message
if not isinstance(data[0], datetime): | ||
raise TypeError( | ||
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime' | ||
' objects.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I usually prefer to leave spaces at the lines instead of the the start of lines -- I think it looks slightly nicer.
xarray/tests/test_netcdftimeindex.py
Outdated
def test_parse_iso8601(string, expected): | ||
from xarray.core.netcdftimeindex import parse_iso8601 | ||
] | ||
ISO8601_STRING_TEST_IDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here was to use something like a dict, just to logically join together test names and parameters in the same place, e.g.,
ISO8601_STRING_TESTS = {
'year': ('1999', date_dict(year='1999')),
'month': ('199901', date_dict(year='1999', month='01')),
...
}
@pytest.mark.parmetrize(('string', 'expected', ISO8601_STRING_TESTS.values(),
ids=ISO8601_STRING_TESTS.keys())
try: | ||
result = self.get_loc(key) | ||
return (is_scalar(result) or type(result) == slice or | ||
(isinstance(result, np.ndarray) and result.size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about the best you can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the conventions module is exactly what I was thinking
return result | ||
"""Adapted from pandas.tseries.index.DatetimeIndex.get_loc""" | ||
if isinstance(key, pycompat.basestring): | ||
return self._get_string_slice(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for fewer hard to predict special cases. Pandas is really inscrutable here.
xarray/tests/test_netcdftimeindex.py
Outdated
expected = xr.DataArray(1).assign_coords(time=index[0]) | ||
result = da.sel(time=date_type(1, 1, 1)) | ||
assert_identical(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some tests for sel
(both scalar and list) with both method='pad'
and method='nearest'
and optionally setting tolerance
@shoyer when you get a chance, things are ready for another review. I think the AppVeyor issues may be due to the version of netCDF4 used. Should we switch to the conda-forge channel to set up the environment there? |
def assert_all_same_netcdftime_datetimes(data): | ||
from netcdftime._netcdftime import datetime | ||
def assert_all_valid_date_type(data): | ||
from netcdftime import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use datetime here -- these are all subclasses, so isinstance checks on the super class work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I buried this in a comment (#1252 (comment)) above. Confusingly, netcdftime.datetime
does not refer to the super class:
In [1]: from netcdftime import datetime, DatetimeAllLeap
In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)
In [3]: test = DatetimeAllLeap(1, 1, 1)
In [4]: isinstance(test, datetime)
Out[4]: False
In [5]: from netcdftime._netcdftime import datetime as super_datetime
In [6]: isinstance(test, super_datetime)
Out[6]: True
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime' | ||
' objects.') | ||
if not all(isinstance(value, type(data[0])) for value in data): | ||
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer to use netcdftime.datetime
, since it's equivalent and doesn't refer to a private submodule (the same reason we don't reference xarray.core.dataset.Dataset
). The repr for netcdftime datetime should probably be fixed to refer to the shorter path.
Okay this seems like a netcdftime bug. Can you report this upstream?
…On Sat, Feb 11, 2017 at 4:18 PM Spencer Clark ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xarray/conventions/netcdftimeindex.py
<#1252>:
> @@ -120,23 +118,31 @@ def get_date_type(self):
return type(self._data[0])
-def assert_all_same_netcdftime_datetimes(data):
- from netcdftime._netcdftime import datetime
+def assert_all_valid_date_type(data):
+ from netcdftime import (
Sorry, I buried this in a comment (#1252 (comment)
<#1252 (comment)>)
above. Confusingly, netcdftime.datetime does not refer to the super class:
In [1]: from netcdftime import datetime, DatetimeAllLeap
In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)
In [3]: test = DatetimeAllLeap(1, 1, 1)
In [4]: isinstance(test, datetime)
Out[4]: False
In [5]: from netcdftime._netcdftime import datetime as super_datetime
In [6]: isinstance(test, super_datetime)
Out[6]: True
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1252>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1m9Egh8sElE3KoU08DEv-TvRn7KEks5rbk_JgaJpZM4L3tsR>
.
|
I must have inadvertently removed it during a merge.
Thanks @shoyer, it looks like that did the trick. I addressed your recent comments; let me know if you have any further feedback. |
See also #2098, which should fix failing builds on master |
Tests are green here now. @shoyer and @spencerkclark - are we waiting on anything else before merging? |
This could be ready. I'm happy to address any further concerns if anyone has them. |
xarray/core/common.py
Outdated
else: | ||
sample = var.data.ravel()[0] | ||
if isinstance(sample, dask_array_type): | ||
sample = sample.compute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluating dask arrays makes me cringe, but I think this is about the best we can currently do with NumPy's current dtype system. Fortunately this should not be common, anyways.
xarray/core/common.py
Outdated
except ImportError: | ||
return False | ||
else: | ||
sample = var.data.ravel()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this could be potentially called on many arrays, let's be a little more careful before calculating sample
:
- Let's verify that the array has
dtype=object
(otherwise it can't contain cftime.datetime objects) - Let's verify that the array has
size > 0
before trying any elements.
xarray/coding/cftimeindex.py
Outdated
def assert_all_valid_date_type(data): | ||
import cftime | ||
|
||
valid_types = (cftime.DatetimeJulian, cftime.DatetimeNoLeap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be simplified to just the base cftime.datetime
?
A couple other things to think about from a usability perspective:
These should at least raise informative errors (use |
Thanks @shoyer, those are good questions. I addressed your inline comments. Let me know if you have anything else.
Through pandas, this raises an informative
I updated things such that if
If
|
OK, I'm happy with this. Time to merge I guess? |
Congrats! This is a great piece of work and will be very useful to the climate community. |
Okay, I'm going to merge now. Hopefully a few of us can stress test this a bit more prior to the next release. Thanks @spencerkclark for all the work here over the past 15 months!!! |
@shoyer, @jhamman, @maxim-lian, @spencerahill many thanks for the substantial feedback, help, and encouragement here. You guys are great! |
Congrats to everyone who made this happen, especially @spencerclark. This feature is going to make so many people happy! |
Credit also due to @rabernat for organizing the workshop in late 2016 where this effort got off the ground, and to @shoyer who sketched out an initial roadmap for the implementation at that meeting. So excited to have this in! In aospy alone, we'll be able to get rid of 100s (1000+?) of lines of code now that CFTime is in place. |
Indeed that meeting played an important role here. Thank you @rabernat! |
git diff upstream/master | flake8 --diff
This work in progress PR is a start on implementing a
NetCDFTimeIndex
, a subclass of pandas.Index, which closely mimics pandas.DatetimeIndex, but usesnetcdftime._netcdftime.datetime
objects. Currently implemented in the new index are:groupby
operations on attributes of date objectsThis index is meant as a step towards improving the handling of non-standard calendars and dates outside the range
Timestamp('1677-09-21 00:12:43.145225')
toTimestamp('2262-04-11 23:47:16.854775807')
.For now I have pushed only the code and some tests for the new index; I want to make sure the index is solid and well-tested before we consider integrating it into any of xarray's existing logic or writing any documentation.
Regarding the index, there are a couple remaining outstanding issues (that at least I'm aware of):
netcdftime._netcdftime.datetime
objects. This means one can attempt to index with an out-of-bounds string or datetime without raising an error. Could this possibly be addressed upstream? For example:get_value
method. I have taken @shoyer's suggested simplified approach from Towards a (temporary?) workaround for datetime issues at the xarray-level #1084 (comment), and tweaked it to also allow for slice indexing, so I think this is most of the way there. A remaining to-do for me, however, is to implement something to allow for integer-indexing outside ofiloc
, e.g. if you have a pandas.Seriesseries
, indexing with the syntaxseries[1]
orseries[1:3]
.Hopefully this is a decent start; in particular I'm not an expert in writing tests so please let me know if there are improvements I can make to the structure and / or style I've used so far. I'm happy to make changes. I appreciate your help.