-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'spearman' correlation method for dataframe.corr
and series.corr
#7141
Add 'spearman' correlation method for dataframe.corr
and series.corr
#7141
Conversation
…rame-add-spearman [REVIEW]Add 'spearman' correlation matrix in dataframe.py rapidsai#6804
Can one of the admins verify this patch? |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! Can you add unit tests for this method to match pandas output?
dataframe.corr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for contributing this! In addition to what @isVoid raised, can you make it so that method
is a kwarg that defaults to pearson
so that we match the pandas API here? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html
This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d. |
here is the unit test script for DataFrame output, will added series calculation as identical as Series.corr in pandas API in coming days + unit test with Series. update: been discussed with @beckernick , modified a bit about unit test, re-upgrade the unit test script |
@dominicshanshan The unitest looks great! In cudf, we utilize pytest to organize our unit test process. For existing cudf/python/cudf/cudf/tests/test_stats.py Lines 396 to 402 in c929ba1
It's likely you can combine the cudf/python/cudf/cudf/tests/test_stats.py Lines 72 to 73 in c929ba1
where test_series_std will run 3 times with ddof parameterized as 0, 1 and 2.
It's also likely you will need cudf/python/cudf/cudf/tests/utils.py Line 69 in 2234554
to compare the result. You can specify check_exact=False for float equality.
|
I tested, it seems if your test datatype is float, it will use will follow you guide and give you pytest tomorrow |
Awesome, and please change the base of this PR to |
please find pytest version of unit test for spearman correlation function, I will also finish this feature in Series and pytest, will change to 0.19 brench |
add correlation docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black series.py passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check again
flake8 check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied some changes to avoid using external only APIs. The below are some new changes to reduce duplication. I think the PR is pretty close!
Co-authored-by: Michael Wang <[email protected]>
Co-authored-by: Michael Wang <[email protected]>
Co-authored-by: Michael Wang <[email protected]>
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #7141 +/- ##
================================================
+ Coverage 86.13% 86.17% +0.03%
================================================
Files 139 141 +2
Lines 22438 22508 +70
================================================
+ Hits 19328 19397 +69
- Misses 3110 3111 +1
Continue to review full report at Codecov.
|
@isVoid , appreciate your help!! why suggest to avoid use external only APIs? |
Outside of the context of this PR, there are many use cases where we only want the column names and do some work with it. Internally cuDF stores column names as keys of the In this PR, however, it's sort of an corner case that we do actually want to explicitly construct an index from the column names. Even so, we would use the For more about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just one more comment that you can feel free to address at your own discretion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! great job and thanks!
@gpucibot merge |
Follow-up work to fix documentation from #7141 before the 22.04 release. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/brandon-b-miller URL: #10493
dataframe.corr
dataframe.corr
and 'series.corr'
dataframe.corr
and 'series.corr'dataframe.corr
and series.corr
Closes #6804
Adds 'spearman' correlation method for
dataframe.corr