-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented spearman's correlation #773
base: develop
Are you sure you want to change the base?
Conversation
d1ce166
to
81977ab
Compare
I have already spoken to @lbluett during the sprints about the need to add tests. |
6fea731
to
5f8d29d
Compare
Added testing for spearman and testing for pearson & spearman divergence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @lbluett for this PR.
I mostly help with documentation. Just a heads up that I often do my reviews in batches, so I might come back later and provide some more feedback.
While I haven't yet had a chance to look through your PR in detail, I wanted to provide some initial feedback.
I've made three review comments.
Additionally, scores
has recently started including examples in the docstrings. It would be great if you could please add an example(s) to the docstring.
For an idea of how to do this, you can take a look at these docstrings:
Example in interval score docstring: https://github.com/nci/scores/blob/develop/src/scores/continuous/interval_impl.py#L206
Example in twCRPS for ensembles docstring: https://github.com/nci/scores/blob/develop/src/scores/probability/crps_impl.py#L1019
`scores.continuous.correlation.pearsonr` | ||
|
||
Reference: | ||
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a journal article that first defines this metric (and, if relevant, a journal article that defines the specific implementation being used)? If so, that article(s) should be cited here.
Note, we now follow APA (7th edition) formatting style for citations - here is a link to their page for citing journal articles.
For more information about the scores
approach to citing references, see the 5th dot point here: https://scores.readthedocs.io/en/stable/contributing.html#submitting-a-pull-request-for-a-new-metric-statistical-technique-or-tool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbluett @nicholasloveday is this the original paper: https://doi.org/10.2307/1412159 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is the original paper.
I think that it may also be nice to still keep the wiki link there too as it is easier to read and provides more context.
Hi Liam. The thing to do here to deal with the merge/updates is:
Let me know if you'd like me to help out with that. I've recently discovered that once someone raises a PR, I can push onto their feature branches to update things, so I'm happy to help out if it's going to be helpful. |
Actually disregard previous comment, yes I'm very stuck in my own mess. Would appreciate it if @tennlee could rebase it properly for me. Serves me right for procrastinating it for two weeks... |
c6abe0e
to
9321a90
Compare
Thanks @lbluett . I am having good fun working out how to resolve this properly. Option 1: This will combine all of your work into a single commit on the feature branch. You will then need to resolve the conflicts, but you will only need to do it once. You can then force-push this back to your fork with Option 2: Option 3: I am very happy to do Option 1 for you, but I thought you might like to consider Option 2. I wouldn't bother with the third option, but you are welcome to if you like. Let me know if you'd like me to take care of option 1, but if you'd like the opportunity to try out the other options for yourself, that's fine also. In terms of why |
Also, if you haven't already since Sunday, make sure to update your environment with the new versions of black, mypy and pylint. |
@lbluett let me know if you'd like to make a time to catch up virtually, and we can go through it together on a screen share |
I've sent you a request on Discord! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull request. I have left some minor feedback.
The major change that I'd like to see is for the tutorial to be more focused on the Spearman's rank correlation coefficient and less of a copy and paste of the Pearson's tutorial
|
||
|
||
def spearmanr( | ||
fcst: xr.DataArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easy enough to extend this to also work with xr.Datasets?
@@ -82,7 +95,7 @@ | |||
(DA4_CORR, DA5_CORR, "space", None, EXP_CORR_DIFF_SIZE), | |||
], | |||
) | |||
def test_correlation(da1, da2, reduce_dims, preserve_dims, expected): | |||
def test_pearson_correlation(da1, da2, reduce_dims, preserve_dims, expected): | |||
""" | |||
Tests continuous.correlation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests continuous.correlation | |
Tests continuous.correlation.pearsnonr |
@@ -0,0 +1,416 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tutorial is mostly a duplication of the Persons correlation tutorial.
I suggest setting this up to be more focused on the Spearmans correlation coef.
Rather than just using the same synthetic forecast and observation data, it would be far more useful if you generated synthetic time series with non-linear relationships, rather than linear relationships; and then constructed a story around that data.
You could then at the end say/show that pearsons == Spearmans when the relationship is linear.
9321a90
to
792218f
Compare
@lbluett Just ignore the mypy issues for now. They are occurring on develop as well, so I will fix them there instead. They sneak in when the tool versions change but the code hasn't. For reasons I don't understand, it seems like they don't always get picked up on a local run of the tools - perhaps some kind of caching that's not obvious to me. Anyhow, it's not your problem to fix. |
author Liam Bluett <[email protected]> 1732590092 +1000 committer Liam Bluett <[email protected]> 1734391465 +1000 Implemented spearman's correlationship Modified notebook to remove noise and add an explanation and reference. Add Spearman's to gallery Change notebook metadata to use 'Python 3 (ipykernel)' and 'python3' rather than custom 'ml' kernel. Testing for spearman implemented Maintainer notes followed, notebook fixed... again Modified notebook to remove noise and add an explanation and reference. cleanup more Add Spearman's to gallery Testing for spearman implemented Maintainer notes followed, notebook fixed... again Notebook kernel changed for testing Update src/scores/continuous/correlation/correlation_impl.py add pyfunc for hyperlink Co-authored-by: Stephanie Chong <[email protected]> Signed-off-by: Liam Bluett <[email protected]> reorder alphabetically
792218f
to
c12e6d4
Compare
Implementation of Spearman's correlation from #313
I've added an example notebook based off of the Pearson's notebook.
Development for new xarray-based metrics
reduce_dims
,preserve_dims
, andweights
args.xr.DataArrays
andxr.Datasets
if possibleDocstrings
Testing of new xarray-based metrics
xr.DataArrays
andxr.Datasets
Tutorial notebook
Documentation