-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The groupby filter output differs if we run an aggregate function before #17091
Comments
Hmm...that's really odd. Surprise inplace modifications aren't nice. PR to patch this is welcome! |
@gfyoung and @arnaudlegout I ran into the same problem, and did some investigation on this. Turns out that when calling Also,
If you do this:
The test will fail. It only passes because of this bug. Am I missing anything here? Many thanks. |
so groupers can keep state, though they really shouldn't. this is might be a wider problem. would be helpful for you to look at it. |
I'll be happy to! |
Code Sample, a copy-pastable example if possible
Problem description
Here is the output of the above code
I don't understand why the column key is in the output in the first run of filter, whereas when running an aggregate function (here g.sum()) before the filter, the key column disappear. If I use the as_index=False for the groupby, then the column is correctly preserved (as expected).
It looks like the aggregate function somehow change the groupby object, whereas my understanding of the groupby object is that each function call return a new object (and do not modify the original groupby object).
Expected Output
Output of
pd.show_versions()
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: