Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calendar utilities #5233

Merged
merged 45 commits into from
Dec 30, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
3e72df9
dt.calendar and date_range
aulemahal Apr 21, 2021
1c37bbd
Migrate calendar utils from xclim | add dt.calendar
aulemahal Apr 28, 2021
1ffd74c
Merge remote-tracking branch 'upstream/master' into calendar-utils
aulemahal Apr 28, 2021
39079e3
upd whats new
aulemahal Apr 28, 2021
11d15ee
skip calendar tests with no cftime
aulemahal Apr 28, 2021
d8ec022
add requires cftime 1.1.0
aulemahal Apr 28, 2021
8fe0a94
import date_ranges in main
aulemahal Apr 28, 2021
f47f823
Apply suggestions from code review
aulemahal Apr 30, 2021
c58e2ae
Merge remote-tracking branch 'upstream/master' into calendar-utils
aulemahal Apr 30, 2021
c311002
Add docs - use already existing is np datetime func
aulemahal Apr 30, 2021
10cf483
Merge remote-tracking branch 'upstream/master' into calendar-utils
aulemahal May 7, 2021
d9e174a
update from suggestions
aulemahal May 7, 2021
84ebc89
Merge remote-tracking branch 'upstream/master' into calendar-utils
aulemahal May 17, 2021
9d6254b
Apply suggestions from code review
aulemahal May 17, 2021
976b3cf
Merge branch 'calendar-utils' of https://github.com/aulemahal/xarray …
aulemahal May 17, 2021
0fce9cb
Modifications following review
aulemahal May 17, 2021
aa74140
Add DataArray and Dataset methods
aulemahal May 17, 2021
bc7a912
use proper type annotation
aulemahal May 17, 2021
5aa9732
Apply suggestions from code review
aulemahal Aug 13, 2021
ca566bd
some more modifications after review
aulemahal Aug 13, 2021
2d7201f
merge main
aulemahal Aug 13, 2021
f307834
merge main
aulemahal Aug 31, 2021
a3e9fb2
Apply suggestions from code review
aulemahal Aug 31, 2021
97909f7
Finish applying suggestions from review
aulemahal Aug 31, 2021
507c501
Put back missing @require_cftime
aulemahal Aug 31, 2021
599882f
Merge branch 'main' into calendar-utils
aulemahal Sep 23, 2021
44be4e5
Apply suggestions from code review
aulemahal Sep 23, 2021
aa03268
Merge branch 'calendar-utils' of github.com:aulemahal/xarray into cal…
aulemahal Sep 23, 2021
c4570d8
Add tests - few fixes
aulemahal Sep 23, 2021
92ab8ba
Merge branch 'main' into calendar-utils
aulemahal Sep 29, 2021
2088230
wrap docstrings
aulemahal Sep 29, 2021
d5b50dc
Change way of importing/testing for cftime
aulemahal Sep 29, 2021
dc9338e
Upd the weather-climate doc page
aulemahal Sep 29, 2021
822529c
fix doc examples
aulemahal Sep 29, 2021
b86de04
Neat docs
aulemahal Sep 30, 2021
d7efe8e
fix in tests after review
aulemahal Oct 12, 2021
4430350
Apply suggestions from code review
aulemahal Oct 18, 2021
790be22
Better explain missing in notes - copy changes to obj methods
aulemahal Oct 18, 2021
eb96222
Merge branch 'main' into calendar-utils
aulemahal Oct 18, 2021
0726082
Merge branch 'main' into calendar-utils
aulemahal Oct 26, 2021
b6f53a8
Apply suggestions from code review
aulemahal Oct 26, 2021
4f02ca7
Merge branch 'calendar-utils' of github.com:aulemahal/xarray into cal…
aulemahal Oct 26, 2021
0f8d5d5
Merge branch 'main' into calendar-utils
aulemahal Dec 13, 2021
2c023a4
Remove unused import
aulemahal Dec 13, 2021
5aa7470
Merge branch 'main' into pr/5233
Illviljan Dec 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@
core.accessor_dt.DatetimeAccessor.floor
core.accessor_dt.DatetimeAccessor.round
core.accessor_dt.DatetimeAccessor.strftime
core.accessor_dt.DatetimeAccessor.calendar
core.accessor_dt.DatetimeAccessor.date
core.accessor_dt.DatetimeAccessor.day
core.accessor_dt.DatetimeAccessor.dayofweek
Expand Down
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,7 @@ Datetimelike properties
DataArray.dt.season
DataArray.dt.time
DataArray.dt.date
DataArray.dt.calendar
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
DataArray.dt.is_month_start
DataArray.dt.is_month_end
DataArray.dt.is_quarter_end
Expand Down Expand Up @@ -835,6 +836,8 @@ Creating custom indexes
:toctree: generated/

cftime_range
date_range
date_range_like

Faceting
--------
Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ New Features
expand, ``False`` to always collapse, or ``default`` to expand unless over a
pre-defined limit (:pull:`5126`).
By `Tom White <https://github.com/tomwhite>`_.
- Added calendar utilies :py:func:`convert_calendar`, :py:func:`interp_calendar`, :py:func:`date_range`, :py:func:`date_range_like` and :py:attr:`DataArray.dt.calendar`. (:pull:`5233`).
By `Pascal Bourgault <https://github.com/aulemahal>`_.
- Prevent passing `concat_dim` to :py:func:`xarray.open_mfdataset` when
`combine='by_coords'` is specified, which should never have been possible (as
:py:func:`xarray.combine_by_coords` has no `concat_dim` argument to pass to).
Expand Down
4 changes: 3 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
)
from .backends.rasterio_ import open_rasterio
from .backends.zarr import open_zarr
from .coding.cftime_offsets import cftime_range
from .coding.cftime_offsets import cftime_range, date_range, date_range_like
from .coding.cftimeindex import CFTimeIndex
from .coding.frequencies import infer_freq
from .conventions import SerializationWarning, decode_cf
Expand Down Expand Up @@ -52,6 +52,8 @@
"combine_by_coords",
"combine_nested",
"concat",
"date_range",
"date_range_like",
"decode_cf",
"dot",
"cov",
Expand Down
283 changes: 283 additions & 0 deletions xarray/coding/calendar_ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
from datetime import timedelta

import numpy as np

from ..core.common import is_np_datetime_like
from .cftime_offsets import date_range_like, get_date_type
from .times import (
_is_numpy_compatible_time_range,
_is_standard_calendar,
cftime_to_nptime,
convert_cftimes,
)

try:
import cftime
except ImportError:
cftime = None


def _days_in_year(year, calendar, use_cftime=True):
spencerkclark marked this conversation as resolved.
Show resolved Hide resolved
"""Return the number of days in the input year according to the input calendar."""
return (
(
get_date_type(calendar, use_cftime=use_cftime)(year + 1, 1, 1)
- timedelta(days=1)
)
.timetuple()
.tm_yday
)
aulemahal marked this conversation as resolved.
Show resolved Hide resolved


def convert_calendar(
ds,
target,
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
dim="time",
align_on=None,
missing=None,
use_cftime=None,
):
"""Convert the Dataset or DataArray to another calendar.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

Only converts the individual timestamps, does not modify any data except in dropping invalid/surplus dates or inserting missing dates.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

If the source and target calendars are either no_leap, all_leap or a standard type, only the type of the time array is modified.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
When converting to a leap year from a non-leap year, the 29th of February is removed from the array.
In the other direction the 29th of February will be missing in the output, unless `missing` is specified, in which case that value is inserted.

For conversions involving `360_day` calendars, see Notes.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

This method is safe to use with sub-daily data as it doesn't touch the time part of the timestamps.

Parameters
----------
ds : DataArray or Dataset
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
Input array/dataset with a time coordinate of a valid dtype (datetime64 or a cftime.datetime).
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
calendar : str
The target calendar name.
dim : str
Name of the time coordinate.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
align_on : {None, 'date', 'year'}
Must be specified when either source or target is a `360_day` calendar, ignored otherwise. See Notes.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
missing : Optional[any]
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
A value to use for filling in dates in the target that were missing in the source.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
Default (None) is not to fill values, so the output time axis might be non-continuous.
use_cftime : boolean, optional
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
Whether to use cftime objects in the output, valid if `calendar` is one of {"proleptic_gregorian", "gregorian" or "standard"}.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
If True, the new time axis uses cftime objects. If None (default), it uses numpy objects if the date range permits it, and cftime ones if not.
If False, it uses numpy objects or fails.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
Copy of source with the time coordinate converted to the target calendar.
If `missing` was None (default), invalid dates in the new calendar are dropped, but missing dates are not inserted.
If `missing` was given, the new data is reindexed to have a continuous time axis, filling missing datas with `missing`.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

Notes
-----
If one of the source or target calendars is `360_day`, `align_on` must be specified and two options are offered.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

"year"
The dates are translated according to their rank in the year (dayofyear), ignoring their original month and day information,
meaning that the missing/surplus days are added/removed at regular intervals.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

From a `360_day` to a standard calendar, the output will be missing the following dates (day of year in parenthesis):
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
To a leap year:
January 31st (31), March 31st (91), June 1st (153), July 31st (213), September 31st (275) and November 30th (335).
To a non-leap year:
February 6th (36), April 19th (109), July 2nd (183), September 12th (255), November 25th (329).

From standard calendar to a '360_day', the following dates in the source array will be dropped:
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
From a leap year:
January 31st (31), April 1st (92), June 1st (153), August 1st (214), September 31st (275), December 1st (336)
From a non-leap year:
February 6th (37), April 20th (110), July 2nd (183), September 13th (256), November 25th (329)

This option is best used on daily and subdaily data.

"date"
The month/day information is conserved and invalid dates are dropped from the output. This means that when converting from
a `360_day` to a standard calendar, all 31st (Jan, March, May, July, August, October and December) will be missing as there is no equivalent
dates in the `360_day` and the 29th (on non-leap years) and 30th of February will be dropped as there are no equivalent dates in
a standard calendar.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

This option is best used with data on a frequency coarser than daily.
"""
# In the following the calendar name "default" is an
# internal hack to mean pandas-backed standard calendar
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
from ..core.dataarray import DataArray

time = ds[dim] # for convenience
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

# Arguments Checks for target
if use_cftime is not True:
# Then we check is pandas is possible.
if _is_standard_calendar(target):
if _is_numpy_compatible_time_range(time):
# Conversion is possible with pandas, force False if it was None.
use_cftime = False
elif use_cftime is False:
raise ValueError(
"Source time range is not valid for numpy datetimes. Try using `use_cftime=True`."
)
# else : Default to cftime
elif use_cftime is False:
# target calendar is ctime-only.
raise ValueError(
f"Calendar '{target}'' is only valid with cftime. Try using `use_cftime=True`."
)
else:
use_cftime = True

# Get source
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
source = time.dt.calendar
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

src_cal = "default" if is_np_datetime_like(time.dtype) else source
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
tgt_cal = target if use_cftime else "default"
if src_cal == tgt_cal:
return ds

if (source == "360_day" or target == "360_day") and align_on is None:
raise ValueError(
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
"Argument `align_on` must be specified with either 'date' or "
"'year' when converting to or from a '360_day' calendar."
)

if source != "360_day" and target != "360_day":
align_on = "date"

out = ds.copy()

if align_on == "year":
# Special case for conversion involving 360_day calendar
# Instead of translating dates directly, this tries to keep the position within a year similar.
def _yearly_interp_doy(time):
# Returns the nearest day in the target calendar of the corresponding "decimal year" in the source calendar
yr = int(time.dt.year[0])
return np.round(
_days_in_year(yr, target, use_cftime)
* time.dt.dayofyear
/ _days_in_year(yr, source, use_cftime)
).astype(int)

def _convert_datetime(date, new_doy, calendar):
"""Convert a datetime object to another calendar.

Redefining the day of year (thus ignoring month and day information from the source datetime).
Nanosecond information are lost as cftime.datetime doesn't support them.
"""
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
new_date = cftime.num2date(
new_doy - 1,
f"days since {date.year}-01-01",
calendar=calendar if use_cftime else "standard",
)
try:
return get_date_type(calendar, use_cftime)(
date.year,
new_date.month,
new_date.day,
date.hour,
date.minute,
date.second,
date.microsecond,
)
except ValueError:
return np.nan

new_doy = time.groupby(f"{dim}.year").map(_yearly_interp_doy)

# Convert the source datetimes, but override the doy with our new doys
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
out[dim] = DataArray(
[
_convert_datetime(date, newdoy, target)
for date, newdoy in zip(time.variable._data.array, new_doy)
],
dims=(dim,),
name=dim,
)
# Remove duplicate timestamps, happens when reducing the number of days
out = out.isel({dim: np.unique(out[dim], return_index=True)[1]})
elif align_on == "date":
if use_cftime:
# Use the Index version of the 1D array
new_times = convert_cftimes(
time.variable._data.array, get_date_type(target), missing=np.NaN
)
else:
new_times = cftime_to_nptime(time.values, raise_on_invalid=False)
out[dim] = new_times

# Remove NaN that where put on invalid dates in target calendar
out = out.where(out[dim].notnull(), drop=True)
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

if missing is not None:
time_target = date_range_like(time, calendar=target, use_cftime=use_cftime)
out = out.reindex({dim: time_target}, fill_value=missing)

# Copy attrs but remove `calendar` if still present.
out[dim].attrs.update(time.attrs)
out[dim].attrs.pop("calendar", None)
return out


def _datetime_to_decimal_year(times, calendar=None):
"""Convert a datetime DataArray to decimal years according to its calendar or the given one.

Decimal years are the number of years since 0001-01-01 00:00:00 AD.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
Ex: '2000-03-01 12:00' is 2000.1653 in a standard calendar, 2000.16301 in a "noleap" or 2000.16806 in a "360_day".
"""
from ..core.dataarray import DataArray

calendar = calendar or times.dt.calendar

if is_np_datetime_like(times.dtype):
times = times.copy(
data=convert_cftimes(times.values, get_date_type("standard"))
)

def _make_index(time):
year = int(time.dt.year[0])
doys = cftime.date2num(times, f"days since {year:04d}-01-01", calendar=calendar)
return DataArray(
year + doys / _days_in_year(year, calendar),
dims=time.dims,
coords=time.coords,
name="time",
)

return times.groupby("time.year").map(_make_index)
spencerkclark marked this conversation as resolved.
Show resolved Hide resolved


def interp_calendar(source, target, dim="time"):
"""Interpolates a DataArray/Dataset to another calendar based on decimal year measure.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

Each timestamp in source and target are first converted to their decimal year equivalent
then source is interpolated on the target coordinate. The decimal year is the number of
years since 0001-01-01 AD.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
Ex: '2000-03-01 12:00' is 2000.1653 in a standard calendar or 2000.16301 in a 'noleap' calendar.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

This method should be used with daily data or coarser. Sub-daily result will have a modified day cycle.
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
source: Union[DataArray, Dataset]
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
The source data to interpolate, must have a time coordinate of a valid dtype (np.datetime64 or cftime objects)
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
target: DataArray
The target time coordinate of a valid dtype (np.datetime64 or cftime objects)
dim : str
The time coordinate name.

Return
------
Union[DataArray, Dataset]
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
The source interpolated on the decimal years of target,
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
"""
cal_src = source[dim].dt.calendar
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
cal_tgt = target.dt.calendar
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
aulemahal marked this conversation as resolved.
Show resolved Hide resolved

out = source.copy()
out[dim] = _datetime_to_decimal_year(source[dim], calendar=cal_src).drop_vars(dim)
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
target_idx = _datetime_to_decimal_year(target, calendar=cal_tgt)
out = out.interp(**{dim: target_idx})
out[dim] = target
return out
Loading