Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFTimeIndex #1252

Merged
merged 75 commits into from
May 13, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
e1e8223
Start on implementing and testing NetCDFTimeIndex
spencerkclark Feb 5, 2017
6496458
TST Move to using pytest fixtures to structure tests
spencerkclark Feb 6, 2017
675b2f7
Address initial review comments
spencerkclark Feb 10, 2017
7beddc1
Address second round of review comments
spencerkclark Feb 11, 2017
3cf03bc
Fix failing python3 tests
spencerkclark Feb 11, 2017
53b085c
Match test method name to method name
spencerkclark Feb 11, 2017
738979b
Merge branch 'master' of https://github.com/pydata/xarray into NetCDF…
spencerkclark Apr 16, 2017
a177f89
First attempts at integrating NetCDFTimeIndex into xarray
spencerkclark May 10, 2017
48ec519
Cleanup
spencerkclark May 11, 2017
9e76df6
Merge branch 'master' into NetCDFTimeIndex
spencerkclark May 11, 2017
2a7b439
Fix DataFrame and Series test failures for NetCDFTimeIndex
spencerkclark May 11, 2017
b942724
First pass at making NetCDFTimeIndex compatible with #1356
spencerkclark May 11, 2017
7845e6d
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Jun 20, 2017
a9ed3c8
Address initial review comments
spencerkclark Jun 26, 2017
3e23ed5
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Aug 25, 2017
a9f3548
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Jan 22, 2018
f00f59a
Restore test_conventions.py
spencerkclark Jan 22, 2018
b34879d
Fix failing test in test_utils.py
spencerkclark Jan 22, 2018
e93b62d
flake8
spencerkclark Jan 22, 2018
61e8bc6
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Feb 20, 2018
0244f58
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Mar 1, 2018
32d7986
Update for standalone netcdftime
spencerkclark Mar 1, 2018
9855176
Address stickler-ci comments
spencerkclark Mar 1, 2018
8d61fdb
Skip test_format_netcdftime_datetime if netcdftime not installed
spencerkclark Mar 1, 2018
6b87da7
A start on documentation
spencerkclark Mar 9, 2018
812710c
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Mar 9, 2018
3610e6e
Fix failing zarr tests related to netcdftime encoding
spencerkclark Mar 9, 2018
8f69a90
Simplify test_decode_standard_calendar_single_element_non_ns_range
spencerkclark Mar 9, 2018
cec909c
Address a couple review comments
spencerkclark Mar 10, 2018
422792b
Use else clause in _maybe_cast_to_netcdftimeindex
spencerkclark Mar 10, 2018
de74037
Start on adding enable_netcdftimeindex option
spencerkclark Mar 10, 2018
2993e3c
Continue parametrizing tests in test_coding_times.py
spencerkclark Mar 10, 2018
f3438fd
Update time-series.rst for enable_netcdftimeindex option
spencerkclark Mar 10, 2018
c35364e
Use :py:func: in rst for xarray.set_options
spencerkclark Mar 10, 2018
08f72dc
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Mar 10, 2018
62ce0ae
Add a what's new entry and test that resample raises a TypeError
spencerkclark Mar 11, 2018
ff05005
Merge branch 'master' of https://github.com/pydata/xarray into NetCDF…
spencerkclark Mar 12, 2018
20fea63
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Mar 16, 2018
d5a3cef
Move what's new entry to the version 0.10.3 section
spencerkclark Mar 16, 2018
e721d26
Add version-dependent pathway for importing netcdftime.datetime
spencerkclark Mar 17, 2018
5e1c4a8
Make NetCDFTimeIndex and date decoding/encoding compatible with datet…
spencerkclark Mar 20, 2018
257f086
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Mar 20, 2018
00e8ada
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Apr 12, 2018
c9d0454
Remove logic to make NetCDFTimeIndex compatible with datetime.datetime
spencerkclark Apr 12, 2018
f678714
Documentation edits
spencerkclark Apr 12, 2018
b03e38e
Ensure proper enable_netcdftimeindex option is used under lazy decoding
spencerkclark Apr 13, 2018
890dde0
Add fix and test for concatenating variables with a NetCDFTimeIndex
spencerkclark Apr 13, 2018
80e05ba
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Apr 16, 2018
13c8358
Further namespace changes due to netcdftime/cftime renaming
spencerkclark Apr 16, 2018
ab46798
NetCDFTimeIndex -> CFTimeIndex
spencerkclark Apr 16, 2018
67fd335
Documentation updates
spencerkclark Apr 16, 2018
7041a8d
Only allow use of CFTimeIndex when using the standalone cftime
spencerkclark Apr 16, 2018
9df4e11
Fix errant what's new changes
spencerkclark Apr 16, 2018
9391463
flake8
spencerkclark Apr 16, 2018
da12ecd
Fix skip logic in test_cftimeindex.py
spencerkclark Apr 16, 2018
a6997ec
Use only_use_cftime_datetimes option in num2date
spencerkclark Apr 26, 2018
7302d7e
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Apr 26, 2018
9dc5539
Require standalone cftime library for all new functionality
spencerkclark Apr 28, 2018
1aa8d86
Improve skipping logic in test_cftimeindex.py
spencerkclark Apr 28, 2018
ef3f2b1
Fix skipping logic in test_cftimeindex.py for when cftime or netcdftime
spencerkclark Apr 28, 2018
4fb5a90
Fix skip logic in Python 3.4 build for test_cftimeindex.py
spencerkclark Apr 28, 2018
1fd205a
Improve error messages when for when the standalone cftime is not ins…
spencerkclark Apr 28, 2018
58a0715
Tweak skip logic in test_accessors.py
spencerkclark Apr 28, 2018
ca4d7dd
flake8
spencerkclark Apr 28, 2018
3947aac
Address review comments
spencerkclark Apr 30, 2018
a395db0
Temporarily remove cftime from py27 build environment on windows
spencerkclark Apr 30, 2018
1b00bde
flake8
spencerkclark Apr 30, 2018
5fdcd20
Install cftime via pip for Python 2.7 on Windows
spencerkclark Apr 30, 2018
459211c
Merge branch 'master' into NetCDFTimeIndex
spencerkclark Apr 30, 2018
7e9bb20
flake8
spencerkclark Apr 30, 2018
247c9eb
Remove unnecessary new lines; simplify _maybe_cast_to_cftimeindex
spencerkclark May 1, 2018
e66abe9
Restore test case for #2002 in test_coding_times.py
spencerkclark May 1, 2018
f25b0b6
Tweak dates out of range warning logic slightly to preserve current d…
spencerkclark May 2, 2018
b10cc73
Merge branch 'master' into NetCDFTimeIndex
spencerkclark May 2, 2018
c318755
Address review comments
spencerkclark May 12, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

from .backends.api import (open_dataset, open_dataarray, open_mfdataset,
save_mfdataset)
from .conventions import decode_cf
from .conventions.coding import decode_cf

try:
from .version import version as __version__
Expand Down
5 changes: 3 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@

import numpy as np

from .. import backends, conventions
from .. import backends
from .common import ArrayWriter, GLOBAL_LOCK
from ..conventions import coding
from ..core import indexing
from ..core.combine import auto_combine
from ..core.utils import close_on_error, is_remote_uri
Expand Down Expand Up @@ -217,7 +218,7 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
cache = chunks is None

def maybe_decode_store(store, lock=False):
ds = conventions.decode_cf(
ds = coding.decode_cf(
store, mask_and_scale=mask_and_scale, decode_times=decode_times,
concat_characters=concat_characters, decode_coords=decode_coords,
drop_variables=drop_variables)
Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from collections import Mapping
from distutils.version import StrictVersion

from ..conventions import cf_encoder
from ..conventions.coding import cf_encoder
from ..core.utils import FrozenOrderedDict
from ..core.pycompat import iteritems, dask_array_type

Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np

from .. import Variable
from ..conventions import pop_to
from ..conventions.coding import pop_to
from ..core import indexing
from ..core.utils import (FrozenOrderedDict, NDArrayMixin,
close_on_error, is_remote_uri)
Expand Down
5 changes: 3 additions & 2 deletions xarray/backends/netcdf3.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@

import numpy as np

from .. import conventions, Variable
from .. import Variable
from ..conventions import coding
from ..core import ops
from ..core.pycompat import basestring, unicode_type, OrderedDict

Expand Down Expand Up @@ -56,7 +57,7 @@ def coerce_nc3_dtype(arr):

def maybe_convert_to_char_array(data, dims):
if data.dtype.kind == 'S' and data.dtype.itemsize > 1:
data = conventions.string_to_char(data)
data = coding.string_to_char(data)
dims = dims + ('string%s' % data.shape[-1],)
return data, dims

Expand Down
Empty file added xarray/conventions/__init__.py
Empty file.
12 changes: 6 additions & 6 deletions xarray/conventions.py → xarray/conventions/coding.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@
from collections import defaultdict
from pandas.tslib import OutOfBoundsDatetime

from .core import indexing, ops, utils
from .core.formatting import format_timestamp, first_n_items, last_item
from .core.variable import as_variable, Variable
from .core.pycompat import iteritems, OrderedDict, PY3, basestring
from ..core import indexing, ops, utils
from ..core.formatting import format_timestamp, first_n_items, last_item
from ..core.variable import as_variable, Variable
from ..core.pycompat import iteritems, OrderedDict, PY3, basestring


# standard calendars recognized by netcdftime
Expand Down Expand Up @@ -929,8 +929,8 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
-------
decoded : Dataset
"""
from .core.dataset import Dataset
from .backends.common import AbstractDataStore
from ..core.dataset import Dataset
from ..backends.common import AbstractDataStore

if isinstance(obj, Dataset):
vars = obj._variables
Expand Down
208 changes: 208 additions & 0 deletions xarray/conventions/netcdftimeindex.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
import re
from datetime import timedelta

import numpy as np
import pandas as pd

from xarray.core import pycompat
from xarray.core.utils import is_scalar


def named(name, pattern):
return '(?P<' + name + '>' + pattern + ')'


def optional(x):
return '(?:' + x + ')?'


def trailing_optional(xs):
if not xs:
return ''
return xs[0] + optional(trailing_optional(xs[1:]))


def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
pieces = [(None, 'year', '\d{4}'),
(date_sep, 'month', '\d{2}'),
(date_sep, 'day', '\d{2}'),
(datetime_sep, 'hour', '\d{2}'),
(time_sep, 'minute', '\d{2}'),
(time_sep, 'second', '\d{2}' + optional('\.\d+'))]
pattern_list = []
for sep, name, sub_pattern in pieces:
pattern_list.append((sep if sep else '') + named(name, sub_pattern))
# TODO: allow timezone offsets?
return '^' + trailing_optional(pattern_list) + '$'


basic_pattern = build_pattern(date_sep='', time_sep='')
extended_pattern = build_pattern()
patterns = [basic_pattern, extended_pattern]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all caps for global constants, and preface for an underscore to indicate that they are private variables, e.g., _BASIC_PATTERN



def parse_iso8601(datetime_string):
for pattern in patterns:
match = re.match(pattern, datetime_string)
if match:
return match.groupdict()
raise ValueError('no ISO-8601 match for string: %s' % datetime_string)


def _parse_iso8601_with_reso(date_type, timestr):
default = date_type(1, 1, 1)
result = parse_iso8601(timestr)
replace = {}

for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']:
value = result.get(attr, None)
if value is not None:
# Note ISO8601 conventions allow for fractional seconds; casting
# to an int means all seconds values get rounded down to the
# nearest integer. TODO: Consider adding support for sub-second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should update the regex above to exclude fractional seconds if that doesn't work

# resolution?
replace[attr] = int(value)
resolution = attr

return default.replace(**replace), resolution


def _parsed_string_to_bounds(date_type, resolution, parsed):
"""Generalization of
pandas.tseries.index.DatetimeIndex._parsed_string_to_bounds
for use with non-standard calendars and netcdftime._netcdftime.datetime
objects.
"""
if resolution == 'year':
return (date_type(parsed.year, 1, 1),
date_type(parsed.year + 1, 1, 1) - timedelta(microseconds=1))
if resolution == 'month':
if parsed.month == 12:
end = date_type(parsed.year + 1, 1, 1) - timedelta(microseconds=1)
else:
end = (date_type(parsed.year, parsed.month + 1, 1) -
timedelta(microseconds=1))
return date_type(parsed.year, parsed.month, 1), end
if resolution == 'day':
start = date_type(parsed.year, parsed.month, parsed.day)
return start, start + timedelta(days=1, microseconds=-1)
if resolution == 'hour':
start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour)
return start, start + timedelta(hours=1, microseconds=-1)
if resolution == 'minute':
start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour,
parsed.minute)
return start, start + timedelta(minutes=1, microseconds=-1)
if resolution == 'second':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something obvious here but shouldn't all these ifs (except the first one) be elifs?

start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour,
parsed.minute, parsed.second)
return start, start + timedelta(seconds=1, microseconds=-1)
else:
raise KeyError


def get_date_field(datetimes, field):
"""Adapted from pandas.tslib.get_date_field"""
return [getattr(date, field) for date in datetimes]


def _field_accessor(name, docstring=None):
"""Adapted from pandas.tseries.index._field_accessor"""
def f(self):
return get_date_field(self._data, name)

f.__name__ = name
f.__doc__ = docstring
return property(f)


def get_date_type(self):
return type(self._data[0])


def assert_all_same_netcdftime_datetimes(data):
from netcdftime._netcdftime import datetime

if not isinstance(data[0], datetime):
raise TypeError(
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the public API name netcdftime.datetime.

Also, print the invalid object in the error message (using .format)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the public API name actually represents a DatetimeProlepticGregorian type, so for now to stick with public API imports, I've resorted to importing all six of the netcdftime datetime types.

In [1]: from netcdftime import datetime

In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)

' objects.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I usually prefer to leave spaces at the lines instead of the the start of lines -- I think it looks slightly nicer.

if not all(isinstance(value, type(data[0])) for value in data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a variable for type(data[0]) outside the loop.

raise TypeError(
'NetCDFTimeIndex requires using netcdftime._netcdftime.datetime'
' objects of all the same type.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concerns as above on the error message



class NetCDFTimeIndex(pd.Index):
def __new__(cls, data):
result = object.__new__(cls)
assert_all_same_netcdftime_datetimes(data)
result._data = np.array(data)
return result

year = _field_accessor('year', 'The year of the datetime')
month = _field_accessor('month', 'The month of the datetime')
day = _field_accessor('day', 'The days of the datetime')
hour = _field_accessor('hour', 'The hours of the datetime')
minute = _field_accessor('minute', 'The minutes of the datetime')
second = _field_accessor('second', 'The seconds of the datetime')
microsecond = _field_accessor('microsecond',
'The microseconds of the datetime')
date_type = property(get_date_type)

def _partial_date_slice(self, resolution, parsed):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I tried to go as simple as possible here and in _get_string_slice. I think trying to exactly mimic DatetimeIndex's behavior could get messy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a few examples (either here or in the docstring) that describe what behavior is not covered in this implementation.

"""Adapted from
pandas.tseries.index.DatetimeIndex._partial_date_slice"""
start, end = _parsed_string_to_bounds(self.date_type, resolution,
parsed)
lhs_mask = (self._data >= start)
rhs_mask = (self._data <= end)
return (lhs_mask & rhs_mask).nonzero()[0]

def _get_string_slice(self, key):
"""Adapted from pandas.tseries.index.DatetimeIndex._get_string_slice"""
parsed, resolution = _parse_iso8601_with_reso(self.date_type, key)
loc = self._partial_date_slice(resolution, parsed)
return loc

def get_loc(self, key, method=None, tolerance=None):
"""Adapted from pandas.tseries.index.DatetimeIndex.get_loc"""
if isinstance(key, pycompat.basestring):
return self._get_string_slice(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for fewer hard to predict special cases. Pandas is really inscrutable here.

else:
return pd.Index.get_loc(self, key, method=method,
tolerance=tolerance)

def _maybe_cast_slice_bound(self, label, side, kind):
"""Adapted from
pandas.tseries.index.DatetimeIndex._maybe_cast_slice_bound"""
if isinstance(label, pycompat.basestring):
parsed, resolution = _parse_iso8601_with_reso(self.date_type,
label)
start, end = _parsed_string_to_bounds(self.date_type, resolution,
parsed)
if self.is_monotonic_decreasing and len(self):
return end if side == 'left' else start
return start if side == 'left' else end
else:
return label

# TODO: Add ability to use integer range outside of iloc?
# e.g. series[1:5].
def get_value(self, series, key):
"""Adapted from pandas.tseries.index.DatetimeIndex.get_value"""
if not isinstance(key, slice):
return series.iloc[self.get_loc(key)]
else:
return series.iloc[self.slice_indexer(
key.start, key.stop, key.step)]

def __contains__(self, key):
"""Adapted from
pandas.tseries.base.DatetimeIndexOpsMixin.__contains__"""
try:
result = self.get_loc(key)
return (is_scalar(result) or type(result) == slice or
(isinstance(result, np.ndarray) and result.size))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially all I want to do here is, if result is a numpy array, check if it is not empty. Is there a cleaner way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the best you can do

except (KeyError, TypeError, ValueError):
return False
2 changes: 1 addition & 1 deletion xarray/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np

from .core.dataarray import DataArray
from .conventions import (
from .conventions.coding import (
maybe_encode_timedelta, maybe_encode_datetime, decode_cf)

ignored_attrs = set(['name', 'tileIndex'])
Expand Down
4 changes: 2 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
from . import indexing
from . import alignment
from . import formatting
from .. import conventions
from .alignment import align
from ..conventions import coding
from .coordinates import DatasetCoordinates, LevelCoordinatesSource, Indexes
from .common import ImplementsDatasetReduce, BaseDataObject
from .merge import (dataset_update_method, dataset_merge_method,
Expand Down Expand Up @@ -875,7 +875,7 @@ def dump_to_store(self, store, encoder=None, sync=True, encoding=None,
"""Store dataset contents to a backends.*DataStore object."""
if encoding is None:
encoding = {}
variables, attrs = conventions.encode_dataset_coordinates(self)
variables, attrs = coding.encode_dataset_coordinates(self)

check_encoding = set()
for k, enc in encoding.items():
Expand Down
Loading