CFTimeIndex #1252

spencerkclark · 2017-02-06T02:10:47Z

closes Towards a (temporary?) workaround for datetime issues at the xarray-level #1084
passes git diff upstream/master | flake8 --diff
tests added / passed
whatsnew entry

This work in progress PR is a start on implementing a NetCDFTimeIndex, a subclass of pandas.Index, which closely mimics pandas.DatetimeIndex, but uses netcdftime._netcdftime.datetime objects. Currently implemented in the new index are:

Partial datetime-string indexing (using strictly ISO8601-format strings, using a date parser implemented by @shoyer in Towards a (temporary?) workaround for datetime issues at the xarray-level #1084 (comment))
Field-accessors for year, month, day, hour, minute, second, and microsecond, to enable groupby operations on attributes of date objects

This index is meant as a step towards improving the handling of non-standard calendars and dates outside the range Timestamp('1677-09-21 00:12:43.145225') to Timestamp('2262-04-11 23:47:16.854775807').

For now I have pushed only the code and some tests for the new index; I want to make sure the index is solid and well-tested before we consider integrating it into any of xarray's existing logic or writing any documentation.

Regarding the index, there are a couple remaining outstanding issues (that at least I'm aware of):

Currently one can create non-sensical datetimes using netcdftime._netcdftime.datetime objects. This means one can attempt to index with an out-of-bounds string or datetime without raising an error. Could this possibly be addressed upstream? For example:

In [1]: from netcdftime import DatetimeNoLeap

In [2]: DatetimeNoLeap(2000, 45, 45)
Out[2]: netcdftime._netcdftime.DatetimeNoLeap(2000, 45, 45, 0, 0, 0, 0, -1, 1)

I am looking to enable this index to be used in pandas.Series and pandas.DataFrame objects as well; this requires implementing a get_value method. I have taken @shoyer's suggested simplified approach from Towards a (temporary?) workaround for datetime issues at the xarray-level #1084 (comment), and tweaked it to also allow for slice indexing, so I think this is most of the way there. A remaining to-do for me, however, is to implement something to allow for integer-indexing outside of iloc, e.g. if you have a pandas.Series series, indexing with the syntax series[1] or series[1:3].

Hopefully this is a decent start; in particular I'm not an expert in writing tests so please let me know if there are improvements I can make to the structure and / or style I've used so far. I'm happy to make changes. I appreciate your help.

max-sixty · 2017-02-06T02:56:15Z

xarray/tests/test_netcdftimeindex.py

+                              self.date_type(1, 2, 1)]),
+            self.da.sel(time=[True, True, False, False])
+        ]:
+            self.assertDataArrayIdentical(result, expected)


These sorts of tests would be much more natural if you use pytest fixtures

max-sixty · 2017-02-06T02:57:01Z

xarray/tests/test_netcdftimeindex.py

+
+
+@requires_netCDF4
+class NetCDFTimeIndexTests(object):


Are you sure pytest runs these tests? I think it requires the class name to start with Test

Yes the tests do run, though I agree I was a bit careless with the names of the classes:
https://travis-ci.org/pydata/xarray/jobs/198709337#L1877

I'll address that as I refactor things to take more advantage of pytest.

max-sixty · 2017-02-06T02:58:38Z

xarray/tests/test_netcdftimeindex.py

+class DatetimeNoLeap(NetCDFTimeIndexTests, TestCase):
+    def set_date_type(self):
+        from netcdftime import DatetimeNoLeap
+        self.date_type = DatetimeNoLeap


These would also be really easy as parameterized fixtures

spencerkclark · 2017-02-06T14:46:36Z

Thanks for the quick feedback on the tests @MaximilianR. Is this on the right track for doing things a little more idiomatically with pytest?

import pytest

from xarray.tests import assert_array_equal


def netcdftime_date_types():
    pytest.importorskip('netCDF4')

    from netcdftime import (
        DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
        DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day)
    return [DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
            DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day]


@pytest.fixture(params=[])
def index(request):
    from xarray.core.netcdftimeindex import NetCDFTimeIndex

    date_type = request.param
    dates = [date_type(1, 1, 1), date_type(1, 2, 1),
             date_type(2, 1, 1), date_type(2, 2, 1)]
    return NetCDFTimeIndex(dates)


@pytest.mark.parametrize('index', netcdftime_date_types(), indirect=True)
@pytest.mark.parametrize(('field', 'expected'), [
    ('year', [1, 1, 2, 2]),
    ('month', [1, 2, 1, 2]),
    ('day', [1, 1, 1, 1]),
    ('hour', [0, 0, 0, 0]),
    ('minute', [0, 0, 0, 0]),
    ('second', [0, 0, 0, 0]),
    ('microsecond', [0, 0, 0, 0])
], ids=['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond'])
def test_netcdftimeindex_field_accessors(index, field, expected):
    result = getattr(index, field)
    assert_array_equal(result, expected)

spencerkclark · 2017-02-06T16:12:18Z

@MaximilianR I think I'm getting the hang of it; ignore the above. I'll push a new update to the PR in a bit.

max-sixty · 2017-02-06T17:11:24Z

Not far off, although no need to use things like ids (or even indirect, although you can if you want for that).

More than happy to offer any guidance on tests if helpful - post an example. pytest is really nice, even if it takes a bit of time to get used to

spencerkclark · 2017-02-06T21:53:00Z

@MaximilianR 6496458 contains an updated version of the file containing the tests, updated to use pytest. I ended up using the ids keyword in places to clean up the test names that are output when running the tests in verbose mode, but I agree it's not super necessary.

Overall pytest seems to clean things up pretty nicely. One problem that I wish I had a better solution for happens when I am testing indexing operations in DataArrays, Series, and DataFrames. There are multiple ways of getting the same answer, which makes these tests a good candidate for using pytest.mark.parametrize; however, one of those ways, using netcdftime._netcdftime.datetime objects directly, depends on the date_type used.

For instance, it would be great if I could write something like:

@pytest.mark.parametrize('sel_arg', [
    '0001',
    slice('0001-01-01', '0001-12-30'),
    [True, True, False, False],
    slice(date_type(1, 1, 1), date_type(1, 12, 30)),
    [date_type(1, 1, 1), date_type(1, 2, 1)]
], ids=['string', 'string-slice', 'bool-list', 'date-slice', 'date-list'])
def test_sel(da, index, sel_arg):
    expected = xr.DataArray([1, 2], coords=[index[:2]], dims=['time'])
    result = da.sel(time=sel_arg)
    assert_identical(result, expected)

But I can't use date_type, which is a fixture in my current setup, in an argument to parametrize. Right now I've worked around this by resorting back to manually iterating over the cases by writing separate methods, but that's pretty verbose; might you happen to know of a cleaner way of setting things up in this case?

In any event, when you get a chance, please let me know if you have any comments / suggestions on my latest push. Thanks again for your help.

max-sixty · 2017-02-06T22:52:35Z

xarray/core/netcdftimeindex.py

+import numpy as np
+import pandas as pd
+
+from pandas.lib import isscalar


V minor but there is an xarray version of this

And the pandas version isn't public API :)

max-sixty · 2017-02-06T22:53:38Z

xarray/core/netcdftimeindex.py

+
+
+def named(name, pattern):
+    return '(?P<' + name + '>' + pattern + ')'


I think .format is faster (as well as idiomatic) because this way will build n strings

This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.

max-sixty · 2017-02-06T23:36:03Z

That looks awesome! Quite a turn around there.

Yes, good point on the date_type. Let me think for a bit on the best way

shoyer

Looks like a very nice start!

Two limitations of the current design that are worth noting:

It doesn't do resample
It doesn't handle missing values

I don't think either of these are deal breakers

shoyer · 2017-02-07T07:48:51Z

xarray/core/netcdftimeindex.py

@@ -0,0 +1,180 @@
+import re


This shouldn't go in core, since there's nothing tying it to core xarray internals. Instead, it should probably go in a new top level module, maybe a new directory alongside the contents of the existing conventions module (rename it to xarray.conventions.coding?).

shoyer · 2017-02-07T07:50:19Z

xarray/core/netcdftimeindex.py

+
+
+def named(name, pattern):
+    return '(?P<' + name + '>' + pattern + ')'


This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.

shoyer · 2017-02-07T07:51:54Z

xarray/core/netcdftimeindex.py

+def parse_iso8601(datetime_string):
+    basic_pattern = build_pattern(date_sep='', time_sep='')
+    extended_pattern = build_pattern()
+    patterns = [basic_pattern, extended_pattern]


Save this in as global variable.

shoyer · 2017-02-07T07:53:07Z

xarray/core/netcdftimeindex.py

+    for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']:
+        value = result.get(attr, None)
+        if value is not None:
+            replace[attr] = int(value)


Note that seconds can be fractional

shoyer · 2017-02-07T07:58:04Z

xarray/core/netcdftimeindex.py

+    return default.replace(**replace), resolution
+
+
+def _parsed_string_to_bounds(date_type, resolution, parsed):


Note this is based on a pandas function

shoyer · 2017-02-07T08:15:59Z

xarray/tests/test_netcdftimeindex.py

+@pytest.fixture
+def feb_days(date_type):
+    from netcdftime import DatetimeAllLeap, Datetime360Day
+    if date_type == DatetimeAllLeap:


Use is for type identity checks.

shoyer · 2017-02-07T08:22:04Z

xarray/tests/test_netcdftimeindex.py

+
+    expected = pd.Series([1, 2], index=index[:2])
+    for arg in range_args:
+        pd.util.testing.assert_series_equal(series[arg], expected)


Be careful. I don't think this is public API. Better to stick with the .equals.

shoyer · 2017-02-07T08:25:28Z

xarray/core/netcdftimeindex.py

+import numpy as np
+import pandas as pd
+
+from pandas.lib import isscalar


And the pandas version isn't public API :)

shoyer · 2017-02-07T08:28:39Z

xarray/core/netcdftimeindex.py

+
+
+def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
+    pieces = [(None, 'year', '\d{4}'),


Do you need negative or five digit years?

Personally, I don't see myself needing it in the near future, but I'm not necessarily opposed to adding that support if others would find it useful.

It would make writing simple positive four-digit year dates more complicated though right? Would you always need the leading zero and the sign?

Then let's not bother until someone asks. Per Wikipedia's ISO 8601 you can optionally use an expanded year representation with + and -. I don't think they would always be necessary but I haven't read the original document (which unfortunately I think is not available only).

FYI NCAR's TraCE simulation project is a 21k yr paleoclimate simulation. Not sure how they handle calendars/times. I know somebody who has analyzed data from this simulation; will ask what it looks like.

shoyer · 2017-02-07T08:30:47Z

xarray/tests/test_netcdftimeindex.py

+        'minute', 'minute-dash', 'second', 'second-dash', 'second-dec',
+        'second-dec-dash'])
+def test_parse_iso8601(string, expected):
+    from xarray.core.netcdftimeindex import parse_iso8601


Do your imports at the top level if at all possible.

shoyer · 2017-02-07T08:42:21Z

Currently one can create non-sensical datetimes using netcdftime._netcdftime.datetime objects. This means one can attempt to index with an out-of-bounds string or datetime without raising an error. Could this possibly be addressed upstream?

Yes, this would be best addressed upstream in netcdftime.

spencerkclark · 2017-02-08T00:29:41Z

@shoyer thanks for your initial review comments. I'll try and push an update in the next few days.

spencerkclark

@shoyer I have just pushed an update. Whenever you get a chance please have another look.

I have a few comments in-line, but more broadly please let me know if I handled moving netcdftimeindex.py out of core and into a new conventions directory the way you wanted. Thanks!

spencerkclark · 2017-02-10T15:24:40Z

xarray/conventions/netcdftimeindex.py

+                                  'The microseconds of the datetime')
+    date_type = property(get_date_type)
+
+    def _partial_date_slice(self, resolution, parsed):


For now I tried to go as simple as possible here and in _get_string_slice. I think trying to exactly mimic DatetimeIndex's behavior could get messy.

could you add a few examples (either here or in the docstring) that describe what behavior is not covered in this implementation.

spencerkclark · 2017-02-10T15:27:09Z

xarray/conventions/netcdftimeindex.py

+        try:
+            result = self.get_loc(key)
+            return (is_scalar(result) or type(result) == slice or
+                    (isinstance(result, np.ndarray) and result.size))


Essentially all I want to do here is, if result is a numpy array, check if it is not empty. Is there a cleaner way to do this?

I think this is about the best you can do

spencerkclark · 2017-02-10T15:37:44Z

xarray/tests/test_netcdftimeindex.py

+    return dict(year=year, month=month, day=day, hour=hour,
+                minute=minute, second=second)
+
+ISO8601_STRING_TESTS = [


Is this along the lines of what you were looking for here? I couldn't find a name for the second argument to pytest.mark.parametrize, so I wasn't sure how to combine these two list arguments into a dict (but maybe I misunderstood what you were asking for).

shoyer · 2017-02-10T16:55:02Z

xarray/conventions/netcdftimeindex.py

@@ -35,10 +36,12 @@ def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
    return '^' + trailing_optional(pattern_list) + '$'


+basic_pattern = build_pattern(date_sep='', time_sep='')
+extended_pattern = build_pattern()
+patterns = [basic_pattern, extended_pattern]


Use all caps for global constants, and preface for an underscore to indicate that they are private variables, e.g., _BASIC_PATTERN

shoyer · 2017-02-10T16:59:58Z

xarray/conventions/netcdftimeindex.py

@@ -54,13 +57,22 @@ def _parse_iso8601_with_reso(date_type, timestr):
    for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']:
        value = result.get(attr, None)
        if value is not None:
+            # Note ISO8601 conventions allow for fractional seconds; casting
+            # to an int means all seconds values get rounded down to the
+            # nearest integer.  TODO: Consider adding support for sub-second


you should update the regex above to exclude fractional seconds if that doesn't work

shoyer · 2017-02-10T17:00:23Z

xarray/conventions/netcdftimeindex.py

+
+    if not isinstance(data[0], datetime):
+        raise TypeError(
+            'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'


Use the public API name netcdftime.datetime.

Also, print the invalid object in the error message (using .format)

Unfortunately the public API name actually represents a DatetimeProlepticGregorian type, so for now to stick with public API imports, I've resorted to importing all six of the netcdftime datetime types.

In [1]: from netcdftime import datetime In [2]: datetime(1, 1, 1) Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)

shoyer · 2017-02-10T17:02:21Z

xarray/conventions/netcdftimeindex.py

+        raise TypeError(
+            'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
+            ' objects.')
+    if not all(isinstance(value, type(data[0])) for value in data):


Create a variable for type(data[0]) outside the loop.

shoyer · 2017-02-10T17:02:51Z

xarray/conventions/netcdftimeindex.py

+    if not all(isinstance(value, type(data[0])) for value in data):
+        raise TypeError(
+            'NetCDFTimeIndex requires using netcdftime._netcdftime.datetime'
+            ' objects of all the same type.')


Same concerns as above on the error message

shoyer · 2017-02-10T17:03:45Z

xarray/conventions/netcdftimeindex.py

+    if not isinstance(data[0], datetime):
+        raise TypeError(
+            'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
+            ' objects.')


nit: I usually prefer to leave spaces at the lines instead of the the start of lines -- I think it looks slightly nicer.

shoyer · 2017-02-10T17:08:55Z

xarray/tests/test_netcdftimeindex.py

-def test_parse_iso8601(string, expected):
-    from xarray.core.netcdftimeindex import parse_iso8601
+]
+ISO8601_STRING_TEST_IDS = [


My thinking here was to use something like a dict, just to logically join together test names and parameters in the same place, e.g.,

ISO8601_STRING_TESTS = { 'year': ('1999', date_dict(year='1999')), 'month': ('199901', date_dict(year='1999', month='01')), ... } @pytest.mark.parmetrize(('string', 'expected', ISO8601_STRING_TESTS.values(), ids=ISO8601_STRING_TESTS.keys())

shoyer · 2017-02-10T17:10:26Z

xarray/conventions/netcdftimeindex.py

+        try:
+            result = self.get_loc(key)
+            return (is_scalar(result) or type(result) == slice or
+                    (isinstance(result, np.ndarray) and result.size))


I think this is about the best you can do

shoyer

Yes, the conventions module is exactly what I was thinking

shoyer · 2017-02-10T17:12:21Z

xarray/conventions/netcdftimeindex.py

-                return result
+        """Adapted from pandas.tseries.index.DatetimeIndex.get_loc"""
+        if isinstance(key, pycompat.basestring):
+            return self._get_string_slice(key)


+1 for fewer hard to predict special cases. Pandas is really inscrutable here.

shoyer · 2017-02-10T17:16:47Z

xarray/tests/test_netcdftimeindex.py

+    expected = xr.DataArray(1).assign_coords(time=index[0])
+    result = da.sel(time=date_type(1, 1, 1))
+    assert_identical(result, expected)
+


Add some tests for sel (both scalar and list) with both method='pad' and method='nearest' and optionally setting tolerance

spencerkclark · 2017-02-11T23:30:43Z

@shoyer when you get a chance, things are ready for another review. I think the AppVeyor issues may be due to the version of netCDF4 used. Should we switch to the conda-forge channel to set up the environment there?

shoyer · 2017-02-12T00:05:35Z

xarray/conventions/netcdftimeindex.py

-def assert_all_same_netcdftime_datetimes(data):
-    from netcdftime._netcdftime import datetime
+def assert_all_valid_date_type(data):
+    from netcdftime import (


You can just use datetime here -- these are all subclasses, so isinstance checks on the super class work fine.

Sorry, I buried this in a comment (#1252 (comment)) above. Confusingly, netcdftime.datetime does not refer to the super class:

In [1]: from netcdftime import datetime, DatetimeAllLeap In [2]: datetime(1, 1, 1) Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1) In [3]: test = DatetimeAllLeap(1, 1, 1) In [4]: isinstance(test, datetime) Out[4]: False In [5]: from netcdftime._netcdftime import datetime as super_datetime In [6]: isinstance(test, super_datetime) Out[6]: True

shoyer · 2017-02-12T00:08:11Z

xarray/conventions/netcdftimeindex.py

-            'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
-            ' objects.')
-    if not all(isinstance(value, type(data[0])) for value in data):
+            'NetCDFTimeIndex requires netcdftime._netcdftime.datetime '


I still prefer to use netcdftime.datetime, since it's equivalent and doesn't refer to a private submodule (the same reason we don't reference xarray.core.dataset.Dataset). The repr for netcdftime datetime should probably be fixed to refer to the shorter path.

shoyer · 2017-02-12T00:24:08Z

Okay this seems like a netcdftime bug. Can you report this upstream?

…

On Sat, Feb 11, 2017 at 4:18 PM Spencer Clark ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In xarray/conventions/netcdftimeindex.py <#1252>: > @@ -120,23 +118,31 @@ def get_date_type(self): return type(self._data[0]) -def assert_all_same_netcdftime_datetimes(data): - from netcdftime._netcdftime import datetime +def assert_all_valid_date_type(data): + from netcdftime import ( Sorry, I buried this in a comment (#1252 (comment) <#1252 (comment)>) above. Confusingly, netcdftime.datetime does not refer to the super class: In [1]: from netcdftime import datetime, DatetimeAllLeap In [2]: datetime(1, 1, 1) Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1) In [3]: test = DatetimeAllLeap(1, 1, 1) In [4]: isinstance(test, datetime) Out[4]: False In [5]: from netcdftime._netcdftime import datetime as super_datetime In [6]: isinstance(test, super_datetime) Out[6]: True — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1252>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1m9Egh8sElE3KoU08DEv-TvRn7KEks5rbk_JgaJpZM4L3tsR> .

I must have inadvertently removed it during a merge.

spencerkclark · 2018-05-02T00:45:57Z

In that case, we should probably add a temporary "pip" clause to the requirements file for windows, to install cftime from pypi instead for now.

Thanks @shoyer, it looks like that did the trick. I addressed your recent comments; let me know if you have any further feedback.

shoyer · 2018-05-02T00:46:47Z

See also #2098, which should fix failing builds on master

…efault

jhamman · 2018-05-11T05:33:13Z

Tests are green here now. @shoyer and @spencerkclark - are we waiting on anything else before merging?

spencerkclark · 2018-05-11T18:33:31Z

This could be ready. I'm happy to address any further concerns if anyone has them.

shoyer · 2018-05-12T06:41:04Z

xarray/core/common.py

+    else:
+        sample = var.data.ravel()[0]
+        if isinstance(sample, dask_array_type):
+            sample = sample.compute()


Evaluating dask arrays makes me cringe, but I think this is about the best we can currently do with NumPy's current dtype system. Fortunately this should not be common, anyways.

shoyer · 2018-05-12T06:45:19Z

xarray/core/common.py

+    except ImportError:
+        return False
+    else:
+        sample = var.data.ravel()[0]


Since this could be potentially called on many arrays, let's be a little more careful before calculating sample:

Let's verify that the array has dtype=object (otherwise it can't contain cftime.datetime objects)

Let's verify that the array has size > 0 before trying any elements.

shoyer · 2018-05-12T06:47:47Z

xarray/coding/cftimeindex.py

+def assert_all_valid_date_type(data):
+    import cftime
+
+    valid_types = (cftime.DatetimeJulian, cftime.DatetimeNoLeap,


This can probably be simplified to just the base cftime.datetime?

shoyer · 2018-05-12T06:50:45Z

A couple other things to think about from a usability perspective:

What happens when you try to resample along CFTimeIndex?
What happens when you try to plot a DataArray with a CFTimeIndex?

These should at least raise informative errors (use NotImplementedError).

spencerkclark · 2018-05-12T13:56:02Z

Thanks @shoyer, those are good questions. I addressed your inline comments. Let me know if you have anything else.

What happens when you try to resample along CFTimeIndex?

Through pandas, this raises an informative TypeError:

In [1]: from cftime import num2date

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: xr.set_options(enable_cftimeindex=True)
Out[4]: <xarray.core.options.set_options at 0x10c3a99d0>

In [5]: time = num2date(np.arange(5), units='days since 0001-01-01', calendar='noleap')

In [6]: data = xr.DataArray(np.arange(5), coords=[time], dims=['time'])

In [7]: data.resample(time='2D').mean()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-577095ede89a> in <module>()
----> 1 data.resample(time='2D').mean()

/Users/spencerclark/xarray-dev/xarray/xarray/core/common.pyc in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs, **indexer)
    616         resampler = self._resample_cls(self, group=group, dim=dim_name,
    617                                        grouper=grouper,
--> 618                                        resample_dim=RESAMPLE_DIM)
    619
    620         return resampler

/Users/spencerclark/xarray-dev/xarray/xarray/core/resample.pyc in __init__(self, *args, **kwargs)
    128                              "cannot have the same name as actual dimension "
    129                              "('{}')! ".format(self._resample_dim, self._dim))
--> 130         super(DataArrayResample, self).__init__(*args, **kwargs)
    131
    132     def apply(self, func, shortcut=False, **kwargs):

/Users/spencerclark/xarray-dev/xarray/xarray/core/groupby.pyc in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    233                 raise ValueError('index must be monotonic for resampling')
    234             s = pd.Series(np.arange(index.size), index)
--> 235             first_items = s.groupby(grouper).first()
    236             full_index = first_items.index
    237             if first_items.isnull().any():

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   5160         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   5161                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 5162                        **kwargs)
   5163
   5164     def asfreq(self, freq, method=None, how=None, normalize=False,

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in groupby(obj, by, **kwds)
   1846         raise TypeError('invalid type: %s' % type(obj))
   1847
-> 1848     return klass(obj, by, **kwds)
   1849
   1850

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    514                                                     level=level,
    515                                                     sort=sort,
--> 516                                                     mutated=self.mutated)
    517
    518         self.obj = obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort, mutated, validate)
   2848     # a passed-in Grouper, directly convert
   2849     if isinstance(key, Grouper):
-> 2850         binner, grouper, obj = key._get_grouper(obj, validate=False)
   2851         if key.key is None:
   2852             return grouper, [], obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/resample.pyc in _get_grouper(self, obj, validate)
   1118     def _get_grouper(self, obj, validate=True):
   1119         # create the resampler and return our binner
-> 1120         r = self._get_resampler(obj)
   1121         r._set_binner()
   1122         return r.binner, r.grouper, r.obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/resample.pyc in _get_resampler(self, obj, kind)
   1114         raise TypeError("Only valid with DatetimeIndex, "
   1115                         "TimedeltaIndex or PeriodIndex, "
-> 1116                         "but got an instance of %r" % type(ax).__name__)
   1117
   1118     def _get_grouper(self, obj, validate=True):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'CFTimeIndex'

What happens when you try to plot a DataArray with a CFTimeIndex?

I updated things such that if cftime.datetime objects are used as a coordinate when plotting, the error message looks like:

In [1]: from cftime import num2date

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: xr.set_options(enable_cftimeindex=True)
Out[4]: <xarray.core.options.set_options at 0x10af58850>

In [5]: time = num2date(np.arange(5), units='days since 0001-01-01', calendar='noleap')

In [6]: data = xr.DataArray(np.arange(5), coords=[time], dims=['time'])

In [7]: data.plot()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-118f55e0b3d0> in <module>()
----> 1 data.plot()

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in __call__(self, **kwargs)
    357
    358     def __call__(self, **kwargs):
--> 359         return plot(self._da, **kwargs)
    360
    361     @functools.wraps(hist)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in plot(darray, row, col, col_wrap, ax, rtol, subplot_kws, **kwargs)
    155     kwargs['ax'] = ax
    156
--> 157     return plotfunc(darray, **kwargs)
    158
    159

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in line(darray, *args, **kwargs)
    258             yplt = darray.coords[ylabel]
    259
--> 260     _ensure_plottable(xplt)
    261
    262     primitive = ax.plot(xplt, yplt, *args, **kwargs)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in _ensure_plottable(*args)
     54         if not (_valid_numpy_subdtype(np.array(x), numpy_types) or
     55                 _valid_other_type(np.array(x), other_types)):
---> 56             raise TypeError('Plotting requires coordinates to be numeric '
     57                             'or dates of type np.datetime64 or '
     58                             'datetime.datetime.')

TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64 or datetime.datetime.

If cftime.datetime objects are the data requested to be plotted, the following error message results:

In [8]: data = xr.DataArray(time, coords=[np.arange(5)], dims=['x'])

In [9]: data.plot()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-9-118f55e0b3d0> in <module>()
----> 1 data.plot()

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in __call__(self, **kwargs)
    357
    358     def __call__(self, **kwargs):
--> 359         return plot(self._da, **kwargs)
    360
    361     @functools.wraps(hist)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in plot(darray, row, col, col_wrap, ax, rtol, subplot_kws, **kwargs)
    124
    125     if contains_cftime_datetimes(darray):
--> 126         raise NotImplementedError('Plotting arrays of cftime.datetime objects '
    127                                   'is currently not possible.')
    128

NotImplementedError: Plotting arrays of cftime.datetime objects is currently not possible.

shoyer · 2018-05-12T19:09:37Z

OK, I'm happy with this. Time to merge I guess?

fmaussion · 2018-05-12T20:37:34Z

Congrats! This is a great piece of work and will be very useful to the climate community.

jhamman · 2018-05-13T05:17:59Z

Okay, I'm going to merge now. Hopefully a few of us can stress test this a bit more prior to the next release. Thanks @spencerkclark for all the work here over the past 15 months!!!

spencerkclark · 2018-05-13T11:32:09Z

@shoyer, @jhamman, @maxim-lian, @spencerahill many thanks for the substantial feedback, help, and encouragement here. You guys are great!

rabernat · 2018-05-13T12:44:02Z

Congrats to everyone who made this happen, especially @spencerclark. This feature is going to make so many people happy!

spencerahill · 2018-05-13T19:21:41Z

Credit also due to @rabernat for organizing the workshop in late 2016 where this effort got off the ground, and to @shoyer who sketched out an initial roadmap for the implementation at that meeting.

So excited to have this in! In aospy alone, we'll be able to get rid of 100s (1000+?) of lines of code now that CFTime is in place.

spencerkclark · 2018-05-13T21:13:23Z

Indeed that meeting played an important role here. Thank you @rabernat!

max-sixty reviewed Feb 6, 2017

View reviewed changes

spencerkclark added 2 commits February 6, 2017 13:52

Start on implementing and testing NetCDFTimeIndex

e1e8223

TST Move to using pytest fixtures to structure tests

6496458

spencerkclark force-pushed the NetCDFTimeIndex branch from 6257f8c to 6496458 Compare February 6, 2017 18:53

max-sixty reviewed Feb 6, 2017

View reviewed changes

shoyer reviewed Feb 7, 2017

View reviewed changes

Address initial review comments

675b2f7

spencerkclark commented Feb 10, 2017

View reviewed changes

shoyer reviewed Feb 10, 2017

View reviewed changes

spencerkclark added 3 commits February 11, 2017 17:01

Address second round of review comments

7beddc1

Fix failing python3 tests

3cf03bc

Match test method name to method name

53b085c

shoyer reviewed Feb 12, 2017

View reviewed changes

This was referenced Feb 12, 2017

netcdftime.datetime refers to DatetimeProlepticGregorian Unidata/cftime#8

Closed

Raise error upon construction of out-of-bounds datetime? Unidata/cftime#9

Closed

fmaussion mentioned this pull request Feb 15, 2017

BUG: Resample on PeriodIndex not working? #1270

Closed

spencerkclark mentioned this pull request Feb 17, 2017

Switch AppVeyor CI to use conda env / requirements.yml #1274

Merged

4 tasks

Restore test case for pydata#2002 in test_coding_times.py

e66abe9

I must have inadvertently removed it during a merge.

spencerkclark added 2 commits May 1, 2018 21:06

Tweak dates out of range warning logic slightly to preserve current d…

f25b0b6

…efault

Merge branch 'master' into NetCDFTimeIndex

b10cc73

jhamman mentioned this pull request May 10, 2018

time array with mixture of types decoded from non-standard calendar #2116

Closed

shoyer reviewed May 12, 2018

View reviewed changes

Address review comments

c318755

jhamman merged commit ebe0dd0 into pydata:master May 13, 2018

spencerkclark deleted the NetCDFTimeIndex branch May 13, 2018 11:32

spencerkclark mentioned this pull request May 13, 2018

Add cftime to doc/environment.yml #2126

Merged

spencerkclark mentioned this pull request May 13, 2018

cftime.datetime serialization example failing in latest doc build #2127

Closed

spencerahill mentioned this pull request May 13, 2018

Time limitation (between years 1678 and 2262) restrictive to climate community #789

Closed

jhamman mentioned this pull request May 16, 2018

add CFTimeIndex enabled date_range function #2142

Closed

spencerkclark mentioned this pull request May 19, 2018

Remove datetime workaround logic spencerahill/aospy#273

Merged

spencerahill mentioned this pull request May 28, 2018

Adding resample functionality to CFTimeIndex #2191

Closed

spencerkclark mentioned this pull request Jul 19, 2018

WIP Add a CFTimeIndex-enabled xr.cftime_range function #2301

Merged

4 tasks

spencerkclark mentioned this pull request Sep 25, 2018

xarray potential inconstistencies with cftime #2437

Closed

abkfenris mentioned this pull request May 15, 2019

Unit error gulfofmaine/buoy_barn#27

Open



		def named(name, pattern):
		return '(?P<' + name + '>' + pattern + ')'

		return default.replace(**replace), resolution


		def _parsed_string_to_bounds(date_type, resolution, parsed):



		def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
		pieces = [(None, 'year', '\d{4}'),

CFTimeIndex #1252

CFTimeIndex #1252

Conversation

spencerkclark commented Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

max-sixty Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerkclark commented Feb 6, 2017 • edited Loading

spencerkclark commented Feb 6, 2017

max-sixty commented Feb 6, 2017

spencerkclark commented Feb 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Feb 6, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Feb 7, 2017

spencerkclark commented Feb 8, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerkclark commented Feb 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Feb 12, 2017 via email

spencerkclark commented May 2, 2018

shoyer commented May 2, 2018

jhamman commented May 11, 2018

spencerkclark commented May 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented May 12, 2018

spencerkclark commented May 12, 2018

What happens when you try to resample along CFTimeIndex?

What happens when you try to plot a DataArray with a CFTimeIndex?

shoyer commented May 12, 2018

fmaussion commented May 12, 2018

jhamman commented May 13, 2018

spencerkclark commented May 13, 2018

rabernat commented May 13, 2018

spencerahill commented May 13, 2018

spencerkclark commented May 13, 2018

spencerkclark commented Feb 6, 2017 •

edited

Loading

max-sixty Feb 6, 2017 •

edited

Loading

spencerkclark commented Feb 6, 2017 •

edited

Loading