Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_timedelta() won't accept all freq units returned by infer_freq() #36769

Open
1 task
rmaunder opened this issue Oct 1, 2020 · 6 comments
Open
1 task
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Frequency DateOffsets Timedelta Timedelta data type

Comments

@rmaunder
Copy link

rmaunder commented Oct 1, 2020

  • [ x ] I have checked that this issue has not already been reported.

  • [ x ] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = '2H')
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fine

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = 'H')
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fails.... : ValueError: unit abbreviation w/o a number

datetime_idx = pd.date_range('07-10-16 00:00', periods = 4, freq = '1H') # datetime drops to 'H'
td_infer = pd.to_timedelta(pd.infer_freq(datetime_idx) ) # Fails.... : ValueError: unit abbreviation w/o a number

td_infer = pd.to_timedelta('1' + pd.infer_freq(datetime_idx) ) # Works...

Problem description

to_timedelta() rejects unitary time units like 'H', 'D' etc which infer_freq() returns. Strings like '2H' etc are accepted. Even if you create the date range with '1H' freq it is dropped back to 'H' by date_range(). So you are forced to handle this special case and prepend a '1' which feels very clumsy.

Expected Output

to_timedelta() should treat strings like 'H' as for '1H' - so output of infer_freq() can be safely passed through.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2a7d332
python : 3.6.4.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.2
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 28.8.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@rmaunder rmaunder added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 1, 2020
@rmaunder rmaunder changed the title BUG: BUG: to_timedelta() won't accept all freq units returned by infer_freq() Oct 1, 2020
@mroeschke mroeschke added Frequency DateOffsets Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 13, 2021
@atpage
Copy link

atpage commented Feb 23, 2023

Confirmed here with pandas 1.5.2.

import datetime as dt
import pandas as pd

start = dt.datetime.now()
idx = pd.DatetimeIndex(
    [start+dt.timedelta(seconds=i) for i in range(10)]
)

print(idx.inferred_freq)  # prints 'S'

print(pd.to_timedelta(idx.inferred_freq))  # raises ValueError

The last line works, though, when replacing seconds=i with e.g. seconds=2*i.

@jbrockmendel
Copy link
Member

Why would you expect this to work? to_offset(obj.inferred_freq) should always work, but not to_timedelta.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Aug 24, 2023
@rmaunder
Copy link
Author

rmaunder commented Aug 24, 2023

I expect things to be consistent. At least within the same library. A time frequency is basically a repeated application of a time delta from shifted start. Therefore it's quite logical to expect to_timedelta to accept the frequency from infer_freq as an time offset.

inter_freq() returns string such as 3H, 2H, H. All of these are accepted by to_timedelta except the last case where '1H' is required. For unitary values either the 1 should be there always (mandatory) for all strings or it's assumed to be 1 when multiplier is applied. Or both can be used interchangeably and consistently

To put it another way, with this issue it's easy to write code that functions perfectly with input data with a consistent multi unit frequency such as 2H, 3H, 2T, 3T but then suddenly fails when unitary versions are used. That seems like the definition of a 'bug' to me (if not the most critical one)

@rmaunder
Copy link
Author

rmaunder commented Aug 24, 2023

Btw see that for to_timedelta in 2.0 it states:

"Changed in version 2.0: Strings with units ‘M’, ‘Y’ and ‘y’ do not represent unambiguous timedelta values and will raise an exception."

However the same argument (that a year, month are variable duration) applies regardless if they are specified with 1 prefix or not.

@jbrockmendel
Copy link
Member

A time frequency is basically a repeated application of a time delta from shifted start

False. infer_freq returns a string corresponding to a DateOffset (FWIW id rather just return the DateOffset itself, but that ship has sailed). Many DateOffsets do behave like timedeltas, in particular those that are Tick subclasses. But others, e.g. BDay, are distinct from timedeltas.

@richard-younergy
Copy link

Ok I accept the point that infer_freq() can return some 'frequencies' such as BDay that can't be treated as a consistent fixed timedelta. So not all values can be passed.

However the main point remains: - infer_freq returns consistent unitary offsets such as hourly as 'H', whereas to_timedelta() requires it to be specified as '1H'.
That still seems quite inconsistent, and in a way that code that functions for ages with some input data will suddenly fail. I don't see a logical reason why either:

  1. infer_freq returns it as '1H'
  2. to_timedelta accepts 'H' and '1H ' as equivalent to one another

#2 seems much preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Frequency DateOffsets Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

5 participants