-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for BUG: grouping with tz-aware: Values falls after last bin #24973
Changes from 4 commits
1940cfd
dd42200
f8cb89a
eccca8e
0454dc3
b009f17
66861ab
dda1c5c
3edb267
c0bc3a5
e631821
5c07d25
df7d650
bcc0599
9c7d0e1
47eb40b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1413,6 +1413,16 @@ def _get_time_bins(self, ax): | |
ambiguous='infer', | ||
nonexistent='shift_forward') | ||
|
||
# GH #24972 | ||
# In edge case of tz-aware grouping binner last index can be | ||
# less than the ax.max() variable in data object, this happens | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should go in _adjust_bin_edges There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there are a lot of issues with generating the remaining bins when I try to move it there. if this way of generating the bins is the correct one then I believe that it is better to leave it here. in _adjust_bin_edges we lose information about ax tz There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or this should possibly be fixed in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, let me apply the patch you suggested in the issue description, it does look better for me. |
||
# because of normalization and DST time change | ||
if len(binner) > 1 and binner[-1] < ax.max(): | ||
extra_date_range = pd.date_range(binner[-1], ax.max() + self.freq, | ||
freq=self.freq, tz=binner[-1].tz, | ||
name=ax.name) | ||
binner = labels = binner.append(extra_date_range[1:]) | ||
|
||
ax_values = ax.asi8 | ||
binner, bin_edges = self._adjust_bin_edges(binner, ax_values) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1744,3 +1744,19 @@ def test_groupby_agg_ohlc_non_first(): | |
result = df.groupby(pd.Grouper(freq='D')).agg(['sum', 'ohlc']) | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_groupby_with_dst_time_change(): | ||
# GH 24972 | ||
import pandas as pd | ||
index = pd.DatetimeIndex([1478064900001000000, 1480037118776792000], | ||
tz='UTC').tz_convert('America/Chicago') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no import needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, sorry |
||
|
||
df = pd.DataFrame([1, 2], index=index) | ||
result = df.groupby(pd.Grouper(freq='1d')).last() | ||
expected_index_values = pd.date_range('2016-11-02', '2016-11-24', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needs to go with other dst grouping tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, didn't find them right away, can you point me to few? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. moved |
||
freq='d', tz='America/Chicago') | ||
|
||
index = pd.DatetimeIndex(expected_index_values) | ||
expected = pd.DataFrame([1.0] + ([np.nan] * 21) + [2.0], index=index) | ||
assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
:meth:`DataFrame.groupby
with a separate:class:Grouper
andfreq='1D'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done