Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame quantile with only datetime dtypes should provide better error message #7308

Closed
TomAugspurger opened this issue Jun 2, 2014 · 6 comments · Fixed by #46096
Closed
Assignees
Labels
Bug quantile quantile method
Milestone

Comments

@TomAugspurger
Copy link
Contributor

In [41]: df = DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]})

In [42]: df['C'] = pd.date_range('2014-01-01', periods=3, freq='m')

In [43]: df
Out[43]: 
   A  B          C
0  1  2 2014-01-31
1  2  3 2014-02-28
2  3  4 2014-03-31

In [44]: df[['C']].quantile(.5)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-1e5bdc20c0ca> in <module>()
----> 1 df[['C']].quantile(.5)

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/core/frame.pyc in quantile(self, q, axis, numeric_only)
   4192         quantiles = [[f(vals, x) for x in per]
   4193                      for (_, vals) in data.iteritems()]
-> 4194         result = DataFrame(quantiles, index=data._info_axis, columns=q).T
   4195         if len(is_dt_col) > 0:
   4196             result[is_dt_col] = result[is_dt_col].applymap(lib.Timestamp)

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    253             else:
    254                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 255                                          copy=copy)
    256         elif isinstance(data, collections.Iterator):
    257             raise TypeError("data argument can't be an iterator")

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/core/frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
    365             columns = _ensure_index(columns)
    366 
--> 367         return create_block_manager_from_blocks([values.T], [columns, index])
    368 
    369     @property

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/core/internals.pyc in create_block_manager_from_blocks(blocks, axes)
   3183         blocks = [getattr(b, 'values', b) for b in blocks]
   3184         tot_items = sum(b.shape[0] for b in blocks)
-> 3185         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   3186 
   3187 

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
   3164         raise e
   3165     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3166         passed,implied))
   3167 
   3168 

ValueError: Shape of passed values is (0, 0), indices imply (1, 0)
@TomAugspurger TomAugspurger added this to the 0.14.1 milestone Jun 2, 2014
@TomAugspurger TomAugspurger self-assigned this Jun 2, 2014
@TomAugspurger TomAugspurger changed the title BUG: quantile fails on width 1 DataFrame with datetime dtypes BUG: DataFrame quantile with only datetime dtypes should provide better error message Jun 2, 2014
@TomAugspurger
Copy link
Contributor Author

Actually, it's got nothing to do with just having one column (like my original title suggested).

quantile has a numeric_only kwarg. When you have all non_numerics, maybe we could be better with either

  • inferring that numeric_only is what you want, since there are 0 non-numeric cols
  • returning a better error message

Maybe I should change the numeric_only kwarg to 'infer'. Where 'infer' means:

  1. If any numeric dtypes, exclude all non-numeric
  2. If all numeric dtypes, don't exclude non-numeric

and it will still accept True and False as before.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2014

hmm, numeric_only is not in the docstring.

I think this should include datetime/timedelta by default. so numeric_only is really an odd choice here (was prob ok before quantile could handle these). you almost always want to exclude object dtypes (and bool types).

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 22, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke removed the Error Reporting Incorrect or improved errors from pandas label Sep 29, 2019
@jbrockmendel jbrockmendel added the quantile quantile method label Oct 22, 2019
@mroeschke
Copy link
Member

Looks to give a reasonable result now. Could use a test

In [32]: In [41]: df = DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]})
    ...:
    ...: In [42]: df['C'] = pd.date_range('2014-01-01', periods=3, freq='m')

In [33]: df[['C']].quantile(.5)
Out[33]: Series([], Name: 0.5, dtype: float64)

In [34]: pd.__version__
Out[34]: '1.1.0.dev0+1108.gcad602e16'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug quantile quantile method labels Apr 5, 2020
@jbrockmendel
Copy link
Member

I think the discussion above suggests this isnt a reasonable result, that we dont want to drop datetime column here

@mroeschke mroeschke added Bug and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Apr 11, 2021
@mroeschke mroeschke added the quantile quantile method label Apr 11, 2021
@NumberPiOso
Copy link
Contributor

Nowadays the parameter is defined in the docstring as follows:

numeric_onlybool, default True

    If False, the quantile of datetime and timedelta data will be computed as well.

So the results

df.quantile(0.5)
A    2.0
B    3.0
Name: 0.5, dtype: float64


df.quantile(0.5, numeric_only=False)
A                    2.0
B                    3.0
C    2014-02-28 00:00:00

So this is not a BUG anymore, it looks more like an enchancement about the argument of numeric_only defaulting to False.

@NumberPiOso
Copy link
Contributor

take

NumberPiOso added a commit to NumberPiOso/pandas that referenced this issue Mar 2, 2022
@jreback jreback modified the milestones: Contributions Welcome, 1.5 Mar 2, 2022
NumberPiOso added a commit to NumberPiOso/pandas that referenced this issue Mar 2, 2022
NumberPiOso added a commit to NumberPiOso/pandas that referenced this issue Mar 8, 2022
NumberPiOso added a commit to NumberPiOso/pandas that referenced this issue Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug quantile quantile method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants