-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support named aggregations in df.groupby().agg()
#15967
Comments
Hi! First timer here. I'm eager to contribute and would like to take on this issue. Could you please assign it to me? Thanks! |
Related request: #15118 |
Hi @NeilGeorge1, thanks for your interest in this issue! I'll assign it to you now. Also, take a look at our contributing guidelines here. |
Hi @Matt711 |
## Description This PR resolves two bugs in the recent pr #15945 ## external issue labeling Recent runs show that it is labeling [issues created](#15967) by team members as `External` Using graphQL to explore the authorAssociation shows `"authorAssociation": "MEMBER"` - I've updated the permissions to be specific to the job in an attempt to ensure that we have the permissions we need. Testing this action in personal repos shows it works as expected so not 100% on what's going on. A PR was also unable to run due to the token only having read permissions, so hopefully this is a two birds one stone fix. It may be beneficial to re-run https://github.com/rapidsai/cudf/actions/runs/9462546964/job/26065765728 with debug mode on to see if `author_association` is different to the action (which would be concerning) *edit test* ## project automation This fixes the workflow incorrectly calling my personal workflows for testing. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] ~New or existing tests cover these changes.~ - [ ] ~The documentation is up to date with these changes.~
Hi, I've successfully implemented the feature as described above. However, while running pytest to verify the functionality across the project, I encountered an issue: 'module cudf is missing' , which is necessary for the tests to run properly. Could you please provide guidance on how to resolve this? It seems that cudf should have been included or configured differently in the environment setup. Should i manually install cudf? Thank you for your assistance. |
Having cloned the repo, have you built cudf? If not, that is probably your issue. The simplest (IMO) way to do this is to use the devcontainer setup in the repository and then (inside the devcontainer) run |
Oh yes I did figure that out eventually. However, I'm facing errors after running pytest related to cuDF, likely because the installed NVIDIA drivers are not able to function properly. Since I'm using a pc without a gpu is this the root of the issue? |
Yes, you can build cudf without a gpu, but you won't be able to run it. Best thing to do is to push a branch and open a PR, we can then review and run things through CI. |
Ok thank you, I'll push the branch soon and open a PR. Looking forward to your review and CI validation. |
Is your feature request related to a problem? Please describe.
I wish cudf python supported column name keywords in
df.groupby().agg()
See pandas docs here
Describe the solution you'd like
Let's add support for column name keywords in
df.groupby().agg()
. This should be a python-only change for API convenience. It will prevent host fallback incudf.pandas
Describe alternatives you've considered
Instead of:
df.groupby('a').agg(mycol=('b','sum'))
we can say
df.groupby('a').agg({'b': 'sum'}).rename(columns={'b': 'named_agg'})
and renaming columns doesn't work with a list of aggregations on the same column
df.groupby('a').agg({'b': ['sum', 'mean']}).rename(columns={???: 'sum'})
It seems like we could write a convenience API here to capture any
kwargs
and their tuples to make new columns.Additional context
Many nice pandas queries for TPC-H were published in the fireducks-dev repo, and the
agg
column name keywords style is common.The text was updated successfully, but these errors were encountered: