Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] libcudf Series correlation (Pearson) #1267

Open
beckernick opened this issue Mar 22, 2019 · 5 comments
Open

[FEA] libcudf Series correlation (Pearson) #1267

beckernick opened this issue Mar 22, 2019 · 5 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

beckernick commented Mar 22, 2019

Is your feature request related to a problem? Please describe.
As a cuDF user, I want to calculate the correlation of two series. Pearson correlation is likely the most commonly used as it is the default in Pandas (API docs).

Describe the solution you'd like
I'd like to be able to do this with series1.corr(series2) and also on DataFrame and Groupby objects.

Describe alternatives you've considered
The alternative is to actually calculate the correlation manually, which is cumbersome.

@beckernick beckernick added Needs Triage Need team to review and classify feature request New feature or request labels Mar 22, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 27, 2019
@beckernick beckernick changed the title [FEA] Series level correlation [FEA] libcuDF correlation Series and Groupby Oct 3, 2019
@beckernick
Copy link
Member Author

beckernick commented Oct 3, 2019

Updating this to explicitly refer to a libcuDF implementation now that #2719 has merged (providing Series.corr)

@beckernick
Copy link
Member Author

Updating this to also refer to DataFrame level correlation

@beckernick beckernick changed the title [FEA] libcuDF correlation Series and Groupby [FEA] libcuDF correlation Series, DataFrame, and Groupby Dec 30, 2019
@rnyak
Copy link
Contributor

rnyak commented Feb 13, 2020

@beckernick Can we now apply corr() after groupby? Basically, I want to be able to do the following with gdf.

ans = pdf[['id2', 'id4' ,'v1', 'v2']].groupby(['id2', 'id4']).apply(lambda x: pd.Series({'r2': x.corr()['v1']['v2']}))

Thanks.

@beckernick
Copy link
Member Author

Assuming you mean once #4140 merges , we cannot. Groupby correlation via the standard API will require a libcuDF implementation.

@beckernick
Copy link
Member Author

@jrhemstad discussed this today in the context of groupbys. Supporting correlation (and implicitly covariance) in the groupby machinery would require additional design, as the aggregation takes more than one input. I'm going to file a new issue to summarize and consolidate further discussion for the groupby aggregation

@beckernick beckernick changed the title [FEA] libcuDF correlation Series, DataFrame, and Groupby [FEA] libcuDF Series correlation (Pearson) Jul 8, 2021
@vyasr vyasr changed the title [FEA] libcuDF Series correlation (Pearson) [FEA] Series correlation (Pearson) Jul 11, 2022
@vyasr vyasr changed the title [FEA] Series correlation (Pearson) [FEA] libcudf Series correlation (Pearson) Jul 11, 2022
@vyasr vyasr added this to cuDF Python Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants