Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linearly spaced date_range (GH 20808) #20846

Merged
merged 12 commits into from
May 3, 2018
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ Other Enhancements
- Updated :meth:`DataFrame.to_gbq` and :meth:`pandas.read_gbq` signature and documentation to reflect changes from
the Pandas-GBQ library version 0.4.0. Adds intersphinx mapping to Pandas-GBQ
library. (:issue:`20564`)
- :func:`date_range` now returns a linearly spaced ``DatetimeIndex`` if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`)

.. _whatsnew_0230.api_breaking:

Expand Down
61 changes: 57 additions & 4 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2583,13 +2583,15 @@ def _generate_regular_range(start, end, periods, freq):
return data


def date_range(start=None, end=None, periods=None, freq='D', tz=None,
def date_range(start=None, end=None, periods=None, freq=None, tz=None,
normalize=False, name=None, closed=None, **kwargs):
"""
Return a fixed frequency DatetimeIndex.

Exactly two of the three parameters `start`, `end` and `periods`
must be specified.
Two or three of the three parameters `start`, `end` and `periods`
must be specified. If all three parameters are specified, and `freq` is
omitted, the resulting DatetimeIndex will have `periods` linearly spaced
elements between `start` and `end` (closed on both sides).

Parameters
----------
Expand All @@ -2616,6 +2618,8 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None,
the 'left', 'right', or both sides (None, the default).
**kwargs
For compatibility. Has no effect on the result.
Can be used to pass arguments to `pd.to_datetime` when specifying
`start`, `end`, and `periods`, but not `freq`.

Returns
-------
Expand All @@ -2631,7 +2635,7 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None,
--------
**Specifying the values**

The next three examples generate the same `DatetimeIndex`, but vary
The next four examples generate the same `DatetimeIndex`, but vary
the combination of `start`, `end` and `periods`.

Specify `start` and `end`, with the default daily frequency.
Expand All @@ -2655,6 +2659,13 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None,
'2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')

Specify `start`, `end`, and `periods`; the frequency is generated
automatically (linearly spaced).

>>> pd.date_range(start='2018-04-24', end='2018-04-27', periods=3)
DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
'2018-04-27 00:00:00'], freq=None)

**Other Parameters**

Changed the `freq` (frequency) to ``'M'`` (month end frequency).
Expand Down Expand Up @@ -2704,7 +2715,49 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None,
>>> pd.date_range(start='2017-01-01', end='2017-01-04', closed='right')
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'],
dtype='datetime64[ns]', freq='D')

Declare extra parameters (kwargs) to be used with the pd.to_datetime
function that is used when all three parameters `start`, `end`, and
`periods` are declared. If this results in anything else than a
DatetimeIndex (like in this example), you cannot specify `tz` or `name`.

>>> date_range('2018-04-24', '2018-04-27', periods=3, box=False)
array(['2018-04-24T00:00:00.000000000', '2018-04-25T12:00:00.000000000',
'2018-04-27T00:00:00.000000000'], dtype='datetime64[ns]')
"""

# See https://github.com/pandas-dev/pandas/issues/20808
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go in the DTI constructor itself. we already do some of this validation, it needs to fit in there. Further you don't need to worry about lots of other things that you are repeating, e.g. tz, which are already handled.

if freq is None and com._all_not_none(periods, start, end):
if is_float(periods):
periods = int(periods)
elif not is_integer(periods):
msg = 'periods must be a number, got {periods}'
raise TypeError(msg.format(periods=periods))

start = Timestamp(start, tz=tz)
end = Timestamp(end, tz=tz)

if normalize:
start = libts.normalize_date(start)
end = libts.normalize_date(end)

di = tools.to_datetime(np.linspace(start.value, end.value, periods),
**kwargs)

try:
if tz is not None:
di = di.tz_localize('UTC').tz_convert(tz)
if name is not None:
di.name = name
except AttributeError:
raise AttributeError("To specify the timezone or a name, the "
"result has to be a DatetimeIndex!")

return di

if freq is None:
freq = 'D'

return DatetimeIndex(start=start, end=end, periods=periods,
freq=freq, tz=tz, normalize=normalize, name=name,
closed=closed, **kwargs)
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/indexes/datetimes/test_date_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,37 @@ def test_date_range_ambiguous_arguments(self):
with tm.assert_raises_regex(ValueError, msg):
date_range(start, end, periods=10, freq='s')

def test_date_range_convenience_periods(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test this with the tz arg specified? Would also be good to test a tz where there is a day light savings transition between start and end.

# GH 20808
rng = date_range('2018-04-24', '2018-04-27', periods=3)
exp = DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
'2018-04-27 00:00:00'], freq=None)

tm.assert_index_equal(rng, exp)

# Test if kwargs work for the to_datetime function used
rng = date_range('2018-04-24', '2018-04-27', periods=3, box=False)
exp = np.array(['2018-04-24T00:00:00', '2018-04-25T12:00:00',
'2018-04-27T00:00:00'], dtype='datetime64[ns]')

tm.assert_numpy_array_equal(rng, exp)

# Test if spacing remains linear if tz changes to dst in range
rng = date_range('2018-04-01 01:00:00', '2018-04-01 04:00:00',
tz='Australia/Sydney', periods=3)
exp = DatetimeIndex(['2018-04-01 01:00:00+11:00',
'2018-04-01 02:00:00+11:00',
'2018-04-01 02:00:00+10:00',
'2018-04-01 03:00:00+10:00',
'2018-04-01 04:00:00+10:00'], freq=None)

# Test AttributeError is raised if result is not a DatetimeIndex
msg = ("To specify the timezone or a name, the "
"result has to be a DatetimeIndex!")
with tm.assert_raises_regex(AttributeError, msg):
rng = date_range('2018-04-24', '2018-04-27', periods=3.3,
name="abc", box=False)

def test_date_range_businesshour(self):
idx = DatetimeIndex(['2014-07-04 09:00', '2014-07-04 10:00',
'2014-07-04 11:00',
Expand Down