BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

rmaunder · 2020-10-01T09:55:04Z

[ x ] I have checked that this issue has not already been reported.
[ x ] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = '2H')
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fine

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = 'H')
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fails.... : ValueError: unit abbreviation w/o a number

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = '1H') # datetime drops to 'H'
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fails.... : ValueError: unit abbreviation w/o a number

td_infer = pd.to_timedelta('1' + pd.infer_freq(datetime_idx) ) # Works...

Problem description

to_timedelta() rejects unitary time units like 'H', 'D' etc which infer_freq() returns. Strings like '2H' etc are accepted. Even if you create the date range with '1H' freq it is dropped back to 'H' by date_range(). So you are forced to handle this special case and prepend a '1' which feels very clumsy.

Expected Output

to_timedelta() should treat strings like 'H' as for '1H' - so output of infer_freq() can be safely passed through.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2a7d332
python : 3.6.4.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.2
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 28.8.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

atpage · 2023-02-23T14:55:43Z

Confirmed here with pandas 1.5.2.

import datetime as dt
import pandas as pd

start = dt.datetime.now()
idx = pd.DatetimeIndex(
    [start+dt.timedelta(seconds=i) for i in range(10)]
)

print(idx.inferred_freq)  # prints 'S'

print(pd.to_timedelta(idx.inferred_freq))  # raises ValueError

The last line works, though, when replacing seconds=i with e.g. seconds=2*i.

jbrockmendel · 2023-08-24T02:17:49Z

Why would you expect this to work? to_offset(obj.inferred_freq) should always work, but not to_timedelta.

rmaunder · 2023-08-24T06:33:52Z

I expect things to be consistent. At least within the same library. A time frequency is basically a repeated application of a time delta from shifted start. Therefore it's quite logical to expect to_timedelta to accept the frequency from infer_freq as an time offset.

inter_freq() returns string such as 3H, 2H, H. All of these are accepted by to_timedelta except the last case where '1H' is required. For unitary values either the 1 should be there always (mandatory) for all strings or it's assumed to be 1 when multiplier is applied. Or both can be used interchangeably and consistently

To put it another way, with this issue it's easy to write code that functions perfectly with input data with a consistent multi unit frequency such as 2H, 3H, 2T, 3T but then suddenly fails when unitary versions are used. That seems like the definition of a 'bug' to me (if not the most critical one)

rmaunder · 2023-08-24T06:43:03Z

Btw see that for to_timedelta in 2.0 it states:

"Changed in version 2.0: Strings with units ‘M’, ‘Y’ and ‘y’ do not represent unambiguous timedelta values and will raise an exception."

However the same argument (that a year, month are variable duration) applies regardless if they are specified with 1 prefix or not.

jbrockmendel · 2023-09-16T16:31:25Z

A time frequency is basically a repeated application of a time delta from shifted start

False. infer_freq returns a string corresponding to a DateOffset (FWIW id rather just return the DateOffset itself, but that ship has sailed). Many DateOffsets do behave like timedeltas, in particular those that are Tick subclasses. But others, e.g. BDay, are distinct from timedeltas.

richard-younergy · 2023-12-14T10:29:19Z

Ok I accept the point that infer_freq() can return some 'frequencies' such as BDay that can't be treated as a consistent fixed timedelta. So not all values can be passed.

However the main point remains: - infer_freq returns consistent unitary offsets such as hourly as 'H', whereas to_timedelta() requires it to be specified as '1H'.
That still seems quite inconsistent, and in a way that code that functions for ages with some input data will suddenly fail. I don't see a logical reason why either:

infer_freq returns it as '1H'
to_timedelta accepts 'H' and '1H ' as equivalent to one another

#2 seems much preferable.

rmaunder added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 1, 2020

rmaunder changed the title ~~BUG:~~ BUG: to_timedelta() won't accept all freq units returned by infer_freq() Oct 1, 2020

gauchm mentioned this issue Mar 26, 2021

datautils.utils.infer_frequency: pd.infer_freq() may produce result which pd.to_timedelta() can't handle neuralhydrology/neuralhydrology#30

Closed

mroeschke added Frequency DateOffsets Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 13, 2021

jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

rmaunder commented Oct 1, 2020 •

edited

Loading

INSTALLED VERSIONS

atpage commented Feb 23, 2023

jbrockmendel commented Aug 24, 2023

rmaunder commented Aug 24, 2023 •

edited

Loading

rmaunder commented Aug 24, 2023 •

edited

Loading

jbrockmendel commented Sep 16, 2023

richard-younergy commented Dec 14, 2023

BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

Comments

rmaunder commented Oct 1, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

atpage commented Feb 23, 2023

jbrockmendel commented Aug 24, 2023

rmaunder commented Aug 24, 2023 • edited Loading

rmaunder commented Aug 24, 2023 • edited Loading

jbrockmendel commented Sep 16, 2023

richard-younergy commented Dec 14, 2023

rmaunder commented Oct 1, 2020 •

edited

Loading

Output of `pd.show_versions()`

rmaunder commented Aug 24, 2023 •

edited

Loading

rmaunder commented Aug 24, 2023 •

edited

Loading