-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: pd.TimeGrouper #26477
CLN: pd.TimeGrouper #26477
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26477 +/- ##
==========================================
- Coverage 91.75% 91.74% -0.01%
==========================================
Files 174 174
Lines 50765 50759 -6
==========================================
- Hits 46578 46568 -10
- Misses 4187 4191 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26477 +/- ##
==========================================
+ Coverage 91.75% 91.75% +<.01%
==========================================
Files 174 174
Lines 50765 50673 -92
==========================================
- Hits 46578 46494 -84
+ Misses 4187 4179 -8
Continue to review full report at Codecov.
|
@@ -11,6 +11,7 @@ | |||
import pandas as pd | |||
from pandas import DataFrame, Index, MultiIndex, Series, Timestamp, date_range | |||
from pandas.core.groupby.ops import BinGrouper | |||
from pandas.core.resample import TimeGrouper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we remove this entirely from resample as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally, TimeGrouper
still holds a lot of the core metadata for resampling. The reason why the TimeGrouper
isn't needed at the toplevel is because of this shortcut in Grouper
:
def __new__(cls, *args, **kwargs):
if kwargs.get('freq') is not None:
from pandas.core.resample import TimeGrouper
cls = TimeGrouper
return super().__new__(cls)
So AFAICT, the internal TimeGrouper
is still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments below, we don't want to use this in the user facing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any objection to replacing TimeGrouper internally as well? Always looking to reduce GroupBy complexity so getting rid of an entire class would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this would be easy
TimeGrouper does a lot of stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured as such. Would still be nice if not for a full class removal to even remove any now internally unused methods and keep paring down the groupby code. If I see any opportunities I'll push a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely +1 for having Grouper adopt TimeGrouper code. Should be possible.
@@ -365,10 +366,8 @@ def sumfunc_value(x): | |||
return x.value.sum() | |||
|
|||
expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change these to use Grouper (here and below)
@@ -11,6 +11,7 @@ | |||
import pandas as pd | |||
from pandas import DataFrame, Index, MultiIndex, Series, Timestamp, date_range | |||
from pandas.core.groupby.ops import BinGrouper | |||
from pandas.core.resample import TimeGrouper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments below, we don't want to use this in the user facing tests
thanks @mroeschke |
git diff upstream/master -u -- "*.py" | flake8 --diff