Skip to content

Commit

Permalink
Add DatetimeAccessor for accessing datetime fields via .dt attribute (
Browse files Browse the repository at this point in the history
#1356)

* Add DatetimeAccessor for accessing datetime fields via `.dt` attribute

* Cleaning up unit tests

* Cleaning up comments and warnings in accessors

* Indirectly access pandas tslib through Series accessors

* Re-factor injection of datetime field accessor properties

* Undo loop/injection of _get_date_field accessors

* Remove public-facing dt property

* Remove extra 'field' argument from _tslib_field_accessor

* Added support for dask arrays

* Added dask test cases
Fixed a bug where data wasn't computed in correct order

* Simplified _get_date_field for both dask/numpy arrays; additional code review cleanups

* Fixing flake8 complaints

* Adding whats-new entry

* Updated timeseries docs with note about dt accessor

* Moved season accessor to DatetimeAccessor

* Re-factor virtual variable logic to lean on DateTimeAccessor

* Added "Returns" documentation to _get_date_field
Fixed imports to facilitate more direct implementation of DateTimeAccessor as a property in DataArray
Moved _access_through_series to a top-level function in accessors.py so that dask serialization will hopefully work a bit better

* Adding timestamp accessor

* Hard-coding expected dtypes for each datetime field

* Fix typo in non-datetime virtual variable access

* Update What's New and timeseries docs
  • Loading branch information
Daniel Rothenberg authored and shoyer committed Apr 29, 2017
1 parent ab4ffee commit 8f6a68e
Show file tree
Hide file tree
Showing 8 changed files with 287 additions and 17 deletions.
23 changes: 20 additions & 3 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,22 @@ For more details, read the pandas documentation.
Datetime components
-------------------

xarray supports a notion of "virtual" or "derived" coordinates for
Similar `to pandas`_, the components of datetime objects contained in a
given ``DataArray`` can be quickly computed using a special ``.dt`` accessor.

.. _to pandas: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors

.. ipython:: python
time = time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
ds.time.dt.hour
ds.time.dt.dayofweek
The ``.dt`` accessor works on both coordinate dimensions as well as
multi-dimensional data.

xarray also supports a notion of "virtual" or "derived" coordinates for
`datetime components`__ implemented by pandas, including "year", "month",
"day", "hour", "minute", "second", "dayofyear", "week", "dayofweek", "weekday"
and "quarter":
Expand All @@ -100,11 +115,13 @@ __ http://pandas.pydata.org/pandas-docs/stable/api.html#time-date-components
ds['time.month']
ds['time.dayofyear']
xarray adds ``'season'`` to the list of datetime components supported by pandas:
For use as a derived coordinate, xarray adds ``'season'`` to the list of
datetime components supported by pandas:

.. ipython:: python
ds['time.season']
ds['time'].dt.season
The set of valid seasons consists of 'DJF', 'MAM', 'JJA' and 'SON', labeled by
the first letters of the corresponding months.
Expand All @@ -124,7 +141,7 @@ calculate the mean by time of day:
For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
offered by the pandas method of the same name. Resample uses essentialy the
offered by the pandas method of the same name. Resample uses essentially the
same api as ``resample`` `in pandas`_.

.. _in pandas: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling
Expand Down
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ What's New
v0.9.6 (unreleased)
-------------------

- Add ``.dt`` accessor to DataArrays for computing datetime-like properties
for the values they contain, similar to ``pandas.Series`` (:issue:`358`).
By `Daniel Rothenberg <https://github.com/darothen>`_.

Enhancements
~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions xarray/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from .core.alignment import align, broadcast, broadcast_arrays
from .core.common import full_like, zeros_like, ones_like
from .core.combine import concat, auto_combine
Expand Down
150 changes: 150 additions & 0 deletions xarray/core/accessors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from .common import is_datetime_like
from .pycompat import dask_array_type

from functools import partial

import numpy as np
import pandas as pd


def _season_from_months(months):
"""Compute season (DJF, MAM, JJA, SON) from month ordinal
"""
# TODO: Move "season" accessor upstream into pandas
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
months = np.asarray(months)
return seasons[(months // 3) % 4]


def _access_through_series(values, name):
"""Coerce an array of datetime-like values to a pandas Series and
access requested datetime component
"""
values_as_series = pd.Series(values.ravel())
if name == "season":
months = values_as_series.dt.month.values
field_values = _season_from_months(months)
else:
field_values = getattr(values_as_series.dt, name).values
return field_values.reshape(values.shape)


def _get_date_field(values, name, dtype):
"""Indirectly access pandas' libts.get_date_field by wrapping data
as a Series and calling through `.dt` attribute.
Parameters
----------
values : np.ndarray or dask.array-like
Array-like container of datetime-like values
name : str
Name of datetime field to access
dtype : dtype-like
dtype for output date field values
Returns
-------
datetime_fields : same type as values
Array-like of datetime fields accessed for each element in values
"""
if isinstance(values, dask_array_type):
from dask.array import map_blocks
return map_blocks(_access_through_series,
values, name, dtype=dtype)
else:
return _access_through_series(values, name)


class DatetimeAccessor(object):
"""Access datetime fields for DataArrays with datetime-like dtypes.
Similar to pandas, fields can be accessed through the `.dt` attribute
for applicable DataArrays:
>>> ds = xarray.Dataset({'time': pd.date_range(start='2000/01/01',
... freq='D', periods=100)})
>>> ds.time.dt
<xarray.core.accessors.DatetimeAccessor at 0x10c369f60>
>>> ds.time.dt.dayofyear[:5]
<xarray.DataArray 'dayofyear' (time: 5)>
array([1, 2, 3, 4, 5], dtype=int32)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
All of the pandas fields are accessible here. Note that these fields are
not calendar-aware; if your datetimes are encoded with a non-Gregorian
calendar (e.g. a 360-day calendar) using netcdftime, then some fields like
`dayofyear` may not be accurate.
"""
def __init__(self, xarray_obj):
if not is_datetime_like(xarray_obj.dtype):
raise TypeError("'dt' accessor only available for "
"DataArray with datetime64 or timedelta64 dtype")
self._obj = xarray_obj

def _tslib_field_accessor(name, docstring=None, dtype=None):
def f(self, dtype=dtype):
if dtype is None:
dtype = self._obj.dtype
obj_type = type(self._obj)
result = _get_date_field(self._obj.data, name, dtype)
return obj_type(result, name=name,
coords=self._obj.coords, dims=self._obj.dims)

f.__name__ = name
f.__doc__ = docstring
return property(f)

year = _tslib_field_accessor('year', "The year of the datetime", np.int64)
month = _tslib_field_accessor(
'month', "The month as January=1, December=12", np.int64
)
day = _tslib_field_accessor('day', "The days of the datetime", np.int64)
hour = _tslib_field_accessor('hour', "The hours of the datetime", np.int64)
minute = _tslib_field_accessor(
'minute', "The minutes of the datetime", np.int64
)
second = _tslib_field_accessor(
'second', "The seconds of the datetime", np.int64
)
microsecond = _tslib_field_accessor(
'microsecond', "The microseconds of the datetime", np.int64
)
nanosecond = _tslib_field_accessor(
'nanosecond', "The nanoseconds of the datetime", np.int64
)
weekofyear = _tslib_field_accessor(
'weekofyear', "The week ordinal of the year", np.int64
)
week = weekofyear
dayofweek = _tslib_field_accessor(
'dayofweek', "The day of the week with Monday=0, Sunday=6", np.int64
)
weekday = dayofweek

weekday_name = _tslib_field_accessor(
'weekday_name', "The name of day in a week (ex: Friday)", object
)

dayofyear = _tslib_field_accessor(
'dayofyear', "The ordinal day of the year", np.int64
)
quarter = _tslib_field_accessor('quarter', "The quarter of the date")
days_in_month = _tslib_field_accessor(
'days_in_month', "The number of days in the month", np.int64
)
daysinmonth = days_in_month

season = _tslib_field_accessor(
"season", "Season of the year (ex: DJF)", object
)

time = _tslib_field_accessor(
"time", "Timestamps corresponding to datetimes", object
)
7 changes: 7 additions & 0 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -761,3 +761,10 @@ def ones_like(other, dtype=None):
"""Shorthand for full_like(other, 1, dtype)
"""
return full_like(other, 1, dtype)


def is_datetime_like(dtype):
"""Check if a dtype is a subclass of the numpy datetime types
"""
return (np.issubdtype(dtype, np.datetime64) or
np.issubdtype(dtype, np.timedelta64))
2 changes: 2 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from . import rolling
from . import ops
from . import utils
from .accessors import DatetimeAccessor
from .alignment import align, reindex_like_indexers
from .common import AbstractArray, BaseDataObject
from .coordinates import (DataArrayCoordinates, LevelCoordinatesSource,
Expand Down Expand Up @@ -158,6 +159,7 @@ class DataArray(AbstractArray, BaseDataObject):
"""
_groupby_cls = groupby.DataArrayGroupBy
_rolling_cls = rolling.DataArrayRolling
dt = property(DatetimeAccessor)

def __init__(self, data, coords=None, dims=None, name=None,
attrs=None, encoding=None, fastpath=False):
Expand Down
21 changes: 7 additions & 14 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from .. import conventions
from .alignment import align
from .coordinates import DatasetCoordinates, LevelCoordinatesSource, Indexes
from .common import ImplementsDatasetReduce, BaseDataObject
from .common import ImplementsDatasetReduce, BaseDataObject, is_datetime_like
from .merge import (dataset_update_method, dataset_merge_method,
merge_data_and_coords)
from .utils import (Frozen, SortedKeysDict, maybe_wrap_array, hashable,
Expand All @@ -32,6 +32,8 @@
integer_types, dask_array_type, range)
from .options import OPTIONS

import xarray as xr

# list of attributes of pd.DatetimeIndex that are ndarrays of time info
_DATETIMEINDEX_COMPONENTS = ['year', 'month', 'day', 'hour', 'minute',
'second', 'microsecond', 'nanosecond', 'date',
Expand Down Expand Up @@ -74,20 +76,11 @@ def _get_virtual_variable(variables, key, level_vars=None, dim_sizes=None):
virtual_var = ref_var
var_name = key
else:
if ref_var.ndim == 1:
date = ref_var.to_index()
elif ref_var.ndim == 0:
date = pd.Timestamp(ref_var.values)
else:
raise KeyError(key)

if var_name == 'season':
# TODO: move 'season' into pandas itself
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
month = date.month
data = seasons[(month // 3) % 4]
if is_datetime_like(ref_var.dtype):
ref_var = xr.DataArray(ref_var)
data = getattr(ref_var.dt, var_name).data
else:
data = getattr(date, var_name)
data = getattr(ref_var, var_name).data
virtual_var = Variable(ref_var.dims, data)

return ref_name, var_name, virtual_var
Expand Down
96 changes: 96 additions & 0 deletions xarray/tests/test_accessors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import xarray as xr
import numpy as np
import pandas as pd

from . import TestCase, requires_dask


class TestDatetimeAccessor(TestCase):
def setUp(self):
nt = 100
data = np.random.rand(10, 10, nt)
lons = np.linspace(0, 11, 10)
lats = np.linspace(0, 20, 10)
self.times = pd.date_range(start="2000/01/01", freq='H', periods=nt)

self.data = xr.DataArray(data, coords=[lons, lats, self.times],
dims=['lon', 'lat', 'time'], name='data')

self.times_arr = np.random.choice(self.times, size=(10, 10, nt))
self.times_data = xr.DataArray(self.times_arr,
coords=[lons, lats, self.times],
dims=['lon', 'lat', 'time'],
name='data')

def test_field_access(self):
years = xr.DataArray(self.times.year, name='year',
coords=[self.times, ], dims=['time', ])
months = xr.DataArray(self.times.month, name='month',
coords=[self.times, ], dims=['time', ])
days = xr.DataArray(self.times.day, name='day',
coords=[self.times, ], dims=['time', ])
hours = xr.DataArray(self.times.hour, name='hour',
coords=[self.times, ], dims=['time', ])

self.assertDataArrayEqual(years, self.data.time.dt.year)
self.assertDataArrayEqual(months, self.data.time.dt.month)
self.assertDataArrayEqual(days, self.data.time.dt.day)
self.assertDataArrayEqual(hours, self.data.time.dt.hour)

def test_not_datetime_type(self):
nontime_data = self.data.copy()
int_data = np.arange(len(self.data.time)).astype('int8')
nontime_data['time'].values = int_data
with self.assertRaisesRegexp(TypeError, 'dt'):
nontime_data.time.dt

@requires_dask
def test_dask_field_access(self):
import dask.array as da

years = self.times_data.dt.year
months = self.times_data.dt.month
hours = self.times_data.dt.hour
days = self.times_data.dt.day

dask_times_arr = da.from_array(self.times_arr, chunks=(5, 5, 50))
dask_times_2d = xr.DataArray(dask_times_arr,
coords=self.data.coords,
dims=self.data.dims,
name='data')
dask_year = dask_times_2d.dt.year
dask_month = dask_times_2d.dt.month
dask_day = dask_times_2d.dt.day
dask_hour = dask_times_2d.dt.hour

# Test that the data isn't eagerly evaluated
assert isinstance(dask_year.data, da.Array)
assert isinstance(dask_month.data, da.Array)
assert isinstance(dask_day.data, da.Array)
assert isinstance(dask_hour.data, da.Array)

# Double check that outcome chunksize is unchanged
dask_chunks = dask_times_2d.chunks
self.assertEqual(dask_year.data.chunks, dask_chunks)
self.assertEqual(dask_month.data.chunks, dask_chunks)
self.assertEqual(dask_day.data.chunks, dask_chunks)
self.assertEqual(dask_hour.data.chunks, dask_chunks)

# Check the actual output from the accessors
self.assertDataArrayEqual(years, dask_year.compute())
self.assertDataArrayEqual(months, dask_month.compute())
self.assertDataArrayEqual(days, dask_day.compute())
self.assertDataArrayEqual(hours, dask_hour.compute())

def test_seasons(self):
dates = pd.date_range(start="2000/01/01", freq="M", periods=12)
dates = xr.DataArray(dates)
seasons = ["DJF", "DJF", "MAM", "MAM", "MAM", "JJA", "JJA", "JJA",
"SON", "SON", "SON", "DJF"]
seasons = xr.DataArray(seasons)

self.assertArrayEqual(seasons.values, dates.dt.season.values)

0 comments on commit 8f6a68e

Please sign in to comment.