Skip to content

Commit

Permalink
Merge pull request pandas-dev#7832 from sinhrks/period_mult
Browse files Browse the repository at this point in the history
ENH: PeriodIndex can accept freq with mult
  • Loading branch information
jreback committed Sep 3, 2015
2 parents 9aafd6d + 2d870f9 commit 8aeaf02
Show file tree
Hide file tree
Showing 18 changed files with 1,063 additions and 360 deletions.
27 changes: 22 additions & 5 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ various docstrings for the classes.
These operations (``apply``, ``rollforward`` and ``rollback``) preserves time (hour, minute, etc) information by default. To reset time, use ``normalize=True`` keyword when creating the offset instance. If ``normalize=True``, result is normalized after the function is applied.


.. ipython:: python
.. ipython:: python
day = Day()
day.apply(Timestamp('2014-01-01 09:00'))
Expand Down Expand Up @@ -1257,8 +1257,10 @@ be created with the convenience function ``period_range``.

Period
~~~~~~

A ``Period`` represents a span of time (e.g., a day, a month, a quarter, etc).
It can be created using a frequency alias:
You can specify the span via ``freq`` keyword using a frequency alias like below.
Because ``freq`` represents a span of ``Period``, it cannot be negative like "-3D".

.. ipython:: python
Expand All @@ -1268,11 +1270,10 @@ It can be created using a frequency alias:
Period('2012-1-1 19:00', freq='H')
Unlike time stamped data, pandas does not support frequencies at multiples of
DateOffsets (e.g., '3Min') for periods.
Period('2012-1-1 19:00', freq='5H')
Adding and subtracting integers from periods shifts the period by its own
frequency.
frequency. Arithmetic is not allowed between ``Period`` with different ``freq`` (span).

.. ipython:: python
Expand All @@ -1282,6 +1283,15 @@ frequency.
p - 3
p = Period('2012-01', freq='2M')
p + 2
p - 1
p == Period('2012-01', freq='3M')
If ``Period`` freq is daily or higher (``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``), ``offsets`` and ``timedelta``-like can be added if the result can have the same freq. Otherise, ``ValueError`` will be raised.

.. ipython:: python
Expand Down Expand Up @@ -1335,6 +1345,13 @@ The ``PeriodIndex`` constructor can also be used directly:
PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')
Passing multiplied frequency outputs a sequence of ``Period`` which
has multiplied span.

.. ipython:: python
PeriodIndex(start='2014-01', freq='3M', periods=4)
Just like ``DatetimeIndex``, a ``PeriodIndex`` can also be used to index pandas
objects:

Expand Down
27 changes: 26 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,32 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene

The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_

.. _whatsnew_0170.periodfreq:

Period Frequency Enhancement
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``Period``, ``PeriodIndex`` and ``period_range`` can now accept multiplied freq. Also, ``Period.freq`` and ``PeriodIndex.freq`` are now stored as ``DateOffset`` instance like ``DatetimeIndex``, not ``str`` (:issue:`7811`)

Multiplied freq represents a span of corresponding length. Below example creates a period of 3 days. Addition and subtraction will shift the period by its span.

.. ipython:: python

p = pd.Period('2015-08-01', freq='3D')
p
p + 1
p - 2
p.to_timestamp()
p.to_timestamp(how='E')

You can use multiplied freq in ``PeriodIndex`` and ``period_range``.

.. ipython:: python

idx = pd.period_range('2015-08-01', periods=4, freq='2D')
idx
idx + 1

.. _whatsnew_0170.enhancements.sas_xport:

Support for SAS XPORT files
Expand Down Expand Up @@ -198,7 +224,6 @@ Other enhancements
- ``pd.Timedelta.total_seconds()`` now returns Timedelta duration to ns precision (previously microsecond precision) (:issue: `10939`)

- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)

- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).
- ``pd.read_stata`` will now read Stata 118 type files. (:issue:`9882`)

Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
14 changes: 13 additions & 1 deletion pandas/io/tests/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
from pandas.compat import u
from pandas.util.misc import is_little_endian
import pandas
from pandas.tseries.offsets import Day, MonthEnd


class TestPickle():
"""
Expand Down Expand Up @@ -90,6 +92,10 @@ def read_pickles(self, version):
if 'ts' in data['series']:
self._validate_timeseries(data['series']['ts'], self.data['series']['ts'])
self._validate_frequency(data['series']['ts'])
if 'index' in data:
if 'period' in data['index']:
self._validate_periodindex(data['index']['period'],
self.data['index']['period'])
n += 1
assert n > 0, 'Pickle files are not tested'

Expand Down Expand Up @@ -162,7 +168,6 @@ def _validate_timeseries(self, pickled, current):

def _validate_frequency(self, pickled):
# GH 9291
from pandas.tseries.offsets import Day
freq = pickled.index.freq
result = freq + Day(1)
tm.assert_equal(result, Day(2))
Expand All @@ -175,6 +180,13 @@ def _validate_frequency(self, pickled):
tm.assert_equal(isinstance(result, pandas.Timedelta), True)
tm.assert_equal(result, pandas.Timedelta(days=1, nanoseconds=1))

def _validate_periodindex(self, pickled, current):
tm.assert_index_equal(pickled, current)
tm.assertIsInstance(pickled.freq, MonthEnd)
tm.assert_equal(pickled.freq, MonthEnd())
tm.assert_equal(pickled.freqstr, 'M')
tm.assert_index_equal(pickled.shift(2), current.shift(2))


if __name__ == '__main__':
import nose
Expand Down
97 changes: 61 additions & 36 deletions pandas/src/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,9 @@ cdef ndarray[int64_t] localize_dt64arr_to_period(ndarray[int64_t] stamps,
return result


_DIFFERENT_FREQ_ERROR = "Input has different freq={1} from Period(freq={0})"


cdef class Period(object):
"""
Represents an period of time
Expand All @@ -624,8 +627,7 @@ cdef class Period(object):
value : Period or compat.string_types, default None
The time period represented (e.g., '4Q2005')
freq : str, default None
e.g., 'B' for businessday. Must be a singular rule-code (e.g. 5T is not
allowed).
One of pandas period strings or corresponding objects
year : int, default None
month : int, default 1
quarter : int, default None
Expand All @@ -641,12 +643,33 @@ cdef class Period(object):
_comparables = ['name','freqstr']
_typ = 'period'

@classmethod
def _maybe_convert_freq(cls, object freq):

if isinstance(freq, compat.string_types):
from pandas.tseries.frequencies import _period_alias_dict
freq = _period_alias_dict.get(freq, freq)
elif isinstance(freq, (int, tuple)):
from pandas.tseries.frequencies import get_freq_code as _gfc
from pandas.tseries.frequencies import _get_freq_str
code, stride = _gfc(freq)
freq = _get_freq_str(code, stride)

from pandas.tseries.frequencies import to_offset
freq = to_offset(freq)

if freq.n <= 0:
raise ValueError('Frequency must be positive, because it'
' represents span: {0}'.format(freq.freqstr))

return freq

@classmethod
def _from_ordinal(cls, ordinal, freq):
""" fast creation from an ordinal and freq that are already validated! """
self = Period.__new__(cls)
self.ordinal = ordinal
self.freq = freq
self.freq = cls._maybe_convert_freq(freq)
return self

def __init__(self, value=None, freq=None, ordinal=None,
Expand All @@ -659,8 +682,6 @@ cdef class Period(object):
# periods such as A, Q, etc. Every five minutes would be, e.g.,
# ('T', 5) but may be passed in as a string like '5T'

self.freq = None

# ordinal is the period offset from the gregorian proleptic epoch

if ordinal is not None and value is not None:
Expand All @@ -675,9 +696,8 @@ cdef class Period(object):
elif value is None:
if freq is None:
raise ValueError("If value is None, freq cannot be None")

ordinal = _ordinal_from_fields(year, month, quarter, day,
hour, minute, second, freq)
hour, minute, second, freq)

elif isinstance(value, Period):
other = value
Expand All @@ -698,8 +718,8 @@ cdef class Period(object):
if lib.is_integer(value):
value = str(value)
value = value.upper()

dt, _, reso = parse_time_string(value, freq)

if freq is None:
try:
freq = frequencies.Resolution.get_freq(reso)
Expand All @@ -723,24 +743,22 @@ cdef class Period(object):
raise ValueError(msg)

base, mult = _gfc(freq)
if mult != 1:
# TODO: Better error message - this is slightly confusing
raise ValueError('Only mult == 1 supported')

if ordinal is None:
self.ordinal = get_period_ordinal(dt.year, dt.month, dt.day,
dt.hour, dt.minute, dt.second, dt.microsecond, 0,
base)
dt.hour, dt.minute, dt.second,
dt.microsecond, 0, base)
else:
self.ordinal = ordinal

self.freq = frequencies._get_freq_str(base)
self.freq = self._maybe_convert_freq(freq)

def __richcmp__(self, other, op):
if isinstance(other, Period):
from pandas.tseries.frequencies import get_freq_code as _gfc
if other.freq != self.freq:
raise ValueError("Cannot compare non-conforming periods")
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
raise ValueError(msg)
if self.ordinal == tslib.iNaT or other.ordinal == tslib.iNaT:
return _nat_scalar_rules[op]
return PyObject_RichCompareBool(self.ordinal, other.ordinal, op)
Expand All @@ -758,7 +776,7 @@ cdef class Period(object):
def _add_delta(self, other):
from pandas.tseries import frequencies
if isinstance(other, (timedelta, np.timedelta64, offsets.Tick, Timedelta)):
offset = frequencies.to_offset(self.freq)
offset = frequencies.to_offset(self.freq.rule_code)
if isinstance(offset, offsets.Tick):
nanos = tslib._delta_to_nanoseconds(other)
offset_nanos = tslib._delta_to_nanoseconds(offset)
Expand All @@ -769,18 +787,21 @@ cdef class Period(object):
else:
ordinal = self.ordinal + (nanos // offset_nanos)
return Period(ordinal=ordinal, freq=self.freq)
msg = 'Input cannnot be converted to Period(freq={0})'
raise ValueError(msg)
elif isinstance(other, offsets.DateOffset):
freqstr = frequencies.get_standard_freq(other)
base = frequencies.get_base_alias(freqstr)

if base == self.freq:
if base == self.freq.rule_code:
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal + other.n
return Period(ordinal=ordinal, freq=self.freq)

raise ValueError("Input has different freq from Period(freq={0})".format(self.freq))
msg = _DIFFERENT_FREQ_ERROR.format(self.freqstr, other.freqstr)
raise ValueError(msg)
else: # pragma no cover
return NotImplemented

def __add__(self, other):
if isinstance(other, (timedelta, np.timedelta64,
Expand All @@ -790,7 +811,7 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal + other
ordinal = self.ordinal + other * self.freq.n
return Period(ordinal=ordinal, freq=self.freq)
else: # pragma: no cover
return NotImplemented
Expand All @@ -804,7 +825,7 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = self.ordinal - other
ordinal = self.ordinal - other * self.freq.n
return Period(ordinal=ordinal, freq=self.freq)
elif isinstance(other, Period):
if other.freq != self.freq:
Expand Down Expand Up @@ -836,13 +857,18 @@ cdef class Period(object):
base1, mult1 = _gfc(self.freq)
base2, mult2 = _gfc(freq)

if mult2 != 1:
raise ValueError('Only mult == 1 supported')

end = how == 'E'
new_ordinal = period_asfreq(self.ordinal, base1, base2, end)
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
# mult1 can't be negative or 0
end = how == 'E'
if end:
ordinal = self.ordinal + mult1 - 1
else:
ordinal = self.ordinal
ordinal = period_asfreq(ordinal, base1, base2, end)

return Period(ordinal=new_ordinal, freq=base2)
return Period(ordinal=ordinal, freq=freq)

@property
def start_time(self):
Expand All @@ -853,7 +879,8 @@ cdef class Period(object):
if self.ordinal == tslib.iNaT:
ordinal = self.ordinal
else:
ordinal = (self + 1).start_time.value - 1
# freq.n can't be negative or 0
ordinal = (self + self.freq.n).start_time.value - 1
return Timestamp(ordinal)

def to_timestamp(self, freq=None, how='start', tz=None):
Expand Down Expand Up @@ -947,14 +974,15 @@ cdef class Period(object):
def __str__(self):
return self.__unicode__()

@property
def freqstr(self):
return self.freq.freqstr

def __repr__(self):
from pandas.tseries import frequencies
from pandas.tseries.frequencies import get_freq_code as _gfc
base, mult = _gfc(self.freq)
formatted = period_format(self.ordinal, base)
freqstr = frequencies._reverse_period_code_map[base]

return "Period('%s', '%s')" % (formatted, freqstr)
return "Period('%s', '%s')" % (formatted, self.freqstr)

def __unicode__(self):
"""
Expand Down Expand Up @@ -1123,9 +1151,6 @@ def _ordinal_from_fields(year, month, quarter, day, hour, minute,
second, freq):
from pandas.tseries.frequencies import get_freq_code as _gfc
base, mult = _gfc(freq)
if mult != 1:
raise ValueError('Only mult == 1 supported')

if quarter is not None:
year, month = _quarter_to_myear(year, quarter, freq)

Expand Down
Loading

0 comments on commit 8aeaf02

Please sign in to comment.