Implemented spearman's correlation #773

lbluett · 2024-11-26T03:11:03Z

Implementation of Spearman's correlation from #313
I've added an example notebook based off of the Pearson's notebook.

Development for new xarray-based metrics

Works with n-dimensional data and includes reduce_dims, preserve_dims, and weights args.
Typehints added
Add error handling
[?] Imported into the API
Works with both xr.DataArrays and xr.Datasets if possible

Docstrings

Docstrings complete and follow Napoleon (google) style
Maths equation added
Reference to paper/webpage is in docstring. The preferred referencing style for journal articles is APA (7th edition)
Code example added

Testing of new xarray-based metrics

100% unit test coverage
Test that metric is compatible with dask.
Test that metrics work with inputs that contain NaNs
Test that broadcasting with xarray works
Test both reduce and preserve dims arguments work
Test that errors are raised as expected
Test that it works with both xr.DataArrays and xr.Datasets

Tutorial notebook

Short introduction to why you would use that metric and what it tells you
A link to a reference
A "things to try next" section at the end
Add notebook to Tutorial_Gallery.ipynb
Optional - a detailed discussion of how the metric works at the end of the notebook

Documentation

[?] Add the score to the API documentation
[?] Add the score to the included list of metrics and tools

tennlee · 2024-11-26T12:37:44Z

I have already spoken to @lbluett during the sprints about the need to add tests.

lbluett · 2024-11-27T00:20:14Z

Added testing for spearman and testing for pearson & spearman divergence

Steph-Chong

Thanks very much @lbluett for this PR.

I mostly help with documentation. Just a heads up that I often do my reviews in batches, so I might come back later and provide some more feedback.

While I haven't yet had a chance to look through your PR in detail, I wanted to provide some initial feedback.

I've made three review comments.

Additionally, scores has recently started including examples in the docstrings. It would be great if you could please add an example(s) to the docstring.

For an idea of how to do this, you can take a look at these docstrings:

Example in interval score docstring: https://github.com/nci/scores/blob/develop/src/scores/continuous/interval_impl.py#L206

Example in twCRPS for ensembles docstring: https://github.com/nci/scores/blob/develop/src/scores/probability/crps_impl.py#L1019

docs/included.md

src/scores/continuous/correlation/correlation_impl.py

Steph-Chong · 2024-11-27T01:53:49Z

src/scores/continuous/correlation/correlation_impl.py

+    `scores.continuous.correlation.pearsonr`
+
+    Reference:
+        https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient


Is there a journal article that first defines this metric (and, if relevant, a journal article that defines the specific implementation being used)? If so, that article(s) should be cited here.

Note, we now follow APA (7th edition) formatting style for citations - here is a link to their page for citing journal articles.

For more information about the scores approach to citing references, see the 5th dot point here: https://scores.readthedocs.io/en/stable/contributing.html#submitting-a-pull-request-for-a-new-metric-statistical-technique-or-tool

@lbluett @nicholasloveday is this the original paper: https://doi.org/10.2307/1412159 ?

Yes, I think that is the original paper.
I think that it may also be nice to still keep the wiki link there too as it is easier to read and provides more context.

tennlee · 2024-11-29T05:12:16Z

Hi Liam. The thing to do here to deal with the merge/updates is:

Sync the develop branch of your fork using the GitHub web UI, which you have probably already done
Use 'git switch develop' to change your local branch and then run "git pull" to fetch that down from the servier
Use 'git switch feature/spearman_rank' to switch to your feature branch
Use 'git rebase' (possibly 'git rebase -i') to rebase your feature branch
Use 'git push -f' to over-write your fork's feature branch with the rebased version

Let me know if you'd like me to help out with that. I've recently discovered that once someone raises a PR, I can push onto their feature branches to update things, so I'm happy to help out if it's going to be helpful.

nicholasloveday · 2024-12-11T01:02:57Z

Hi @lbluett, I'll review this shortly. Can you please let us know if you want to do the rebase that @tennlee suggested above or if you would like @tennlee to do it for you?

lbluett · 2024-12-11T03:23:35Z

Hi @lbluett, I'll review this shortly. Can you please let us know if you want to do the rebase that @tennlee suggested above or if you would like @tennlee to do it for you?

Actually disregard previous comment, yes I'm very stuck in my own mess. Would appreciate it if @tennlee could rebase it properly for me. Serves me right for procrastinating it for two weeks...

lbluett · 2024-12-11T03:37:37Z

I had synced my feature branch in c6abe0e, so I've pushed it reset it back to the original state that @tennlee gave me the suggestions to rebase. (9321a90)

tennlee · 2024-12-11T07:04:57Z

Thanks @lbluett . I am having good fun working out how to resolve this properly.

Option 1:
git rebase -i develop and then change all but the first commit to "squash".

This will combine all of your work into a single commit on the feature branch. You will then need to resolve the conflicts, but you will only need to do it once. You can then force-push this back to your fork with git push -f, and it will seem like you did all the work in a single commit.

Option 2:
An alternative is to keep some of the history, but squash (combine) the less-significant commits, resulting in say two or three commits rather than the many that are there currently. This keeps a bit more of the history, but then you still need to work through the conflicts for each "picked" commit.

Option 3:
You could also choose to work through every single commit using git rebase resolving conflicts for each commit in the sequence. It will work fine, but it requires a lot more manual steps.

I am very happy to do Option 1 for you, but I thought you might like to consider Option 2. I wouldn't bother with the third option, but you are welcome to if you like. Let me know if you'd like me to take care of option 1, but if you'd like the opportunity to try out the other options for yourself, that's fine also.

In terms of why git rebase (without the - i) is getting complicated, I think it's because some of the commits in the history seem to have been created twice for some reason, so old conflicts are resurfacing again in later commits. I'm not entirely sure how that eventuated, to be honest. But it has the effect of needing every commit on the rebase to be manually updated for conflicts, rather than it just occuring on the relevant commit in the history.

tennlee · 2024-12-11T07:16:12Z

Also, if you haven't already since Sunday, make sure to update your environment with the new versions of black, mypy and pylint.

tennlee · 2024-12-11T22:36:29Z

@lbluett let me know if you'd like to make a time to catch up virtually, and we can go through it together on a screen share

lbluett · 2024-12-11T22:48:39Z

@lbluett let me know if you'd like to make a time to catch up virtually, and we can go through it together on a screen share

I've sent you a request on Discord!

nicholasloveday

Thanks for this pull request. I have left some minor feedback.

The major change that I'd like to see is for the tutorial to be more focused on the Spearman's rank correlation coefficient and less of a copy and paste of the Pearson's tutorial

nicholasloveday · 2024-12-11T23:13:59Z

src/scores/continuous/correlation/correlation_impl.py

+
+
+def spearmanr(
+    fcst: xr.DataArray,


Would it be easy enough to extend this to also work with xr.Datasets?

nicholasloveday · 2024-12-11T23:23:01Z

tests/continuous/test_correlation.py

@@ -82,7 +95,7 @@
        (DA4_CORR, DA5_CORR, "space", None, EXP_CORR_DIFF_SIZE),
    ],
 )
-def test_correlation(da1, da2, reduce_dims, preserve_dims, expected):
+def test_pearson_correlation(da1, da2, reduce_dims, preserve_dims, expected):
    """
    Tests continuous.correlation


Suggested change

Tests continuous.correlation

Tests continuous.correlation.pearsnonr

nicholasloveday · 2024-12-12T04:28:47Z

tutorials/Spearmans_Correlation.ipynb

@@ -0,0 +1,416 @@
+{


This tutorial is mostly a duplication of the Persons correlation tutorial.

I suggest setting this up to be more focused on the Spearmans correlation coef.

Rather than just using the same synthetic forecast and observation data, it would be far more useful if you generated synthetic time series with non-linear relationships, rather than linear relationships; and then constructed a story around that data.

You could then at the end say/show that pearsons == Spearmans when the relationship is linear.

tennlee · 2024-12-17T00:19:37Z

@lbluett Just ignore the mypy issues for now. They are occurring on develop as well, so I will fix them there instead. They sneak in when the tool versions change but the code hasn't. For reasons I don't understand, it seems like they don't always get picked up on a local run of the tools - perhaps some kind of caching that's not obvious to me. Anyhow, it's not your problem to fix.

author Liam Bluett <[email protected]> 1732590092 +1000 committer Liam Bluett <[email protected]> 1734391465 +1000 Implemented spearman's correlationship Modified notebook to remove noise and add an explanation and reference. Add Spearman's to gallery Change notebook metadata to use 'Python 3 (ipykernel)' and 'python3' rather than custom 'ml' kernel. Testing for spearman implemented Maintainer notes followed, notebook fixed... again Modified notebook to remove noise and add an explanation and reference. cleanup more Add Spearman's to gallery Testing for spearman implemented Maintainer notes followed, notebook fixed... again Notebook kernel changed for testing Update src/scores/continuous/correlation/correlation_impl.py add pyfunc for hyperlink Co-authored-by: Stephanie Chong <[email protected]> Signed-off-by: Liam Bluett <[email protected]> reorder alphabetically

lbluett force-pushed the feature/spearman_rank branch from d1ce166 to 81977ab Compare November 26, 2024 03:34

tennlee linked an issue Nov 26, 2024 that may be closed by this pull request

Request: Spearman rank correlation #313

Open

lbluett force-pushed the feature/spearman_rank branch from 6fea731 to 5f8d29d Compare November 26, 2024 23:53

Steph-Chong reviewed Nov 27, 2024

View reviewed changes

tennlee added this to the Version 2.1 milestone Dec 5, 2024

lbluett force-pushed the feature/spearman_rank branch from c6abe0e to 9321a90 Compare December 11, 2024 03:35

lbluett changed the title ~~Implemented spearman's correlationship~~ Implemented spearman's correlation Dec 11, 2024

nicholasloveday requested changes Dec 12, 2024

View reviewed changes

lbluett force-pushed the feature/spearman_rank branch from 9321a90 to 792218f Compare December 16, 2024 23:35

lbluett added 2 commits December 17, 2024 11:07

black formatting, rebased

c12e6d4

lbluett force-pushed the feature/spearman_rank branch from 792218f to c12e6d4 Compare December 17, 2024 01:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented spearman's correlation #773

Implemented spearman's correlation #773

lbluett commented Nov 26, 2024

tennlee commented Nov 26, 2024

lbluett commented Nov 27, 2024

Steph-Chong left a comment •

edited

Loading

Steph-Chong Nov 27, 2024 •

edited

Loading

tennlee Nov 29, 2024

nicholasloveday Nov 30, 2024

tennlee commented Nov 29, 2024

nicholasloveday commented Dec 11, 2024

lbluett commented Dec 11, 2024

lbluett commented Dec 11, 2024

tennlee commented Dec 11, 2024

tennlee commented Dec 11, 2024

tennlee commented Dec 11, 2024

lbluett commented Dec 11, 2024

nicholasloveday left a comment

nicholasloveday Dec 11, 2024

nicholasloveday Dec 11, 2024

nicholasloveday Dec 12, 2024

tennlee commented Dec 17, 2024

	Tests continuous.correlation
	Tests continuous.correlation.pearsnonr

Implemented spearman's correlation #773

Are you sure you want to change the base?

Implemented spearman's correlation #773

Conversation

lbluett commented Nov 26, 2024

Development for new xarray-based metrics

Docstrings

Testing of new xarray-based metrics

Tutorial notebook

Documentation

tennlee commented Nov 26, 2024

lbluett commented Nov 27, 2024

Steph-Chong left a comment • edited Loading

Choose a reason for hiding this comment

Steph-Chong Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

tennlee Nov 29, 2024

Choose a reason for hiding this comment

nicholasloveday Nov 30, 2024

Choose a reason for hiding this comment

tennlee commented Nov 29, 2024

nicholasloveday commented Dec 11, 2024

lbluett commented Dec 11, 2024

lbluett commented Dec 11, 2024

tennlee commented Dec 11, 2024

tennlee commented Dec 11, 2024

tennlee commented Dec 11, 2024

lbluett commented Dec 11, 2024

nicholasloveday left a comment

Choose a reason for hiding this comment

nicholasloveday Dec 11, 2024

Choose a reason for hiding this comment

nicholasloveday Dec 11, 2024

Choose a reason for hiding this comment

nicholasloveday Dec 12, 2024

Choose a reason for hiding this comment

tennlee commented Dec 17, 2024

Steph-Chong left a comment •

edited

Loading

Steph-Chong Nov 27, 2024 •

edited

Loading