Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented spearman's correlation #773

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

lbluett
Copy link
Contributor

@lbluett lbluett commented Nov 26, 2024

Implementation of Spearman's correlation from #313
I've added an example notebook based off of the Pearson's notebook.

Development for new xarray-based metrics

  • Works with n-dimensional data and includes reduce_dims, preserve_dims, and weights args.
  • Typehints added
  • Add error handling
  • [?] Imported into the API
  • Works with both xr.DataArrays and xr.Datasets if possible

Docstrings

  • Docstrings complete and follow Napoleon (google) style
  • Maths equation added
  • Reference to paper/webpage is in docstring. The preferred referencing style for journal articles is APA (7th edition)
  • Code example added

Testing of new xarray-based metrics

  • 100% unit test coverage
  • Test that metric is compatible with dask.
  • Test that metrics work with inputs that contain NaNs
  • Test that broadcasting with xarray works
  • Test both reduce and preserve dims arguments work
  • Test that errors are raised as expected
  • Test that it works with both xr.DataArrays and xr.Datasets

Tutorial notebook

  • Short introduction to why you would use that metric and what it tells you
  • A link to a reference
  • A "things to try next" section at the end
  • Add notebook to Tutorial_Gallery.ipynb
  • Optional - a detailed discussion of how the metric works at the end of the notebook

Documentation

@lbluett lbluett force-pushed the feature/spearman_rank branch from d1ce166 to 81977ab Compare November 26, 2024 03:34
@tennlee tennlee linked an issue Nov 26, 2024 that may be closed by this pull request
@tennlee
Copy link
Collaborator

tennlee commented Nov 26, 2024

I have already spoken to @lbluett during the sprints about the need to add tests.

@lbluett lbluett force-pushed the feature/spearman_rank branch from 6fea731 to 5f8d29d Compare November 26, 2024 23:53
@lbluett
Copy link
Contributor Author

lbluett commented Nov 27, 2024

Added testing for spearman and testing for pearson & spearman divergence

Copy link
Collaborator

@Steph-Chong Steph-Chong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @lbluett for this PR.

I mostly help with documentation. Just a heads up that I often do my reviews in batches, so I might come back later and provide some more feedback.

While I haven't yet had a chance to look through your PR in detail, I wanted to provide some initial feedback.

I've made three review comments.

Additionally, scores has recently started including examples in the docstrings. It would be great if you could please add an example(s) to the docstring.

For an idea of how to do this, you can take a look at these docstrings:

Example in interval score docstring: https://github.com/nci/scores/blob/develop/src/scores/continuous/interval_impl.py#L206

Example in twCRPS for ensembles docstring: https://github.com/nci/scores/blob/develop/src/scores/probability/crps_impl.py#L1019

docs/included.md Outdated Show resolved Hide resolved
src/scores/continuous/correlation/correlation_impl.py Outdated Show resolved Hide resolved
`scores.continuous.correlation.pearsonr`

Reference:
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
Copy link
Collaborator

@Steph-Chong Steph-Chong Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a journal article that first defines this metric (and, if relevant, a journal article that defines the specific implementation being used)? If so, that article(s) should be cited here.

Note, we now follow APA (7th edition) formatting style for citations - here is a link to their page for citing journal articles.

For more information about the scores approach to citing references, see the 5th dot point here: https://scores.readthedocs.io/en/stable/contributing.html#submitting-a-pull-request-for-a-new-metric-statistical-technique-or-tool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is the original paper.
I think that it may also be nice to still keep the wiki link there too as it is easier to read and provides more context.

@tennlee
Copy link
Collaborator

tennlee commented Nov 29, 2024

Hi Liam. The thing to do here to deal with the merge/updates is:

  1. Sync the develop branch of your fork using the GitHub web UI, which you have probably already done
  2. Use 'git switch develop' to change your local branch and then run "git pull" to fetch that down from the servier
  3. Use 'git switch feature/spearman_rank' to switch to your feature branch
  4. Use 'git rebase' (possibly 'git rebase -i') to rebase your feature branch
  5. Use 'git push -f' to over-write your fork's feature branch with the rebased version

Let me know if you'd like me to help out with that. I've recently discovered that once someone raises a PR, I can push onto their feature branches to update things, so I'm happy to help out if it's going to be helpful.

@tennlee tennlee added this to the Version 2.1 milestone Dec 5, 2024
@nicholasloveday
Copy link
Collaborator

Hi @lbluett, I'll review this shortly. Can you please let us know if you want to do the rebase that @tennlee suggested above or if you would like @tennlee to do it for you?

@lbluett
Copy link
Contributor Author

lbluett commented Dec 11, 2024

Hi @lbluett, I'll review this shortly. Can you please let us know if you want to do the rebase that @tennlee suggested above or if you would like @tennlee to do it for you?

Actually disregard previous comment, yes I'm very stuck in my own mess. Would appreciate it if @tennlee could rebase it properly for me. Serves me right for procrastinating it for two weeks...

@lbluett lbluett force-pushed the feature/spearman_rank branch from c6abe0e to 9321a90 Compare December 11, 2024 03:35
@lbluett
Copy link
Contributor Author

lbluett commented Dec 11, 2024

I had synced my feature branch in c6abe0e, so I've pushed it reset it back to the original state that @tennlee gave me the suggestions to rebase. (9321a90)

@tennlee
Copy link
Collaborator

tennlee commented Dec 11, 2024

Thanks @lbluett . I am having good fun working out how to resolve this properly.

Option 1:
git rebase -i develop and then change all but the first commit to "squash".

This will combine all of your work into a single commit on the feature branch. You will then need to resolve the conflicts, but you will only need to do it once. You can then force-push this back to your fork with git push -f, and it will seem like you did all the work in a single commit.

Option 2:
An alternative is to keep some of the history, but squash (combine) the less-significant commits, resulting in say two or three commits rather than the many that are there currently. This keeps a bit more of the history, but then you still need to work through the conflicts for each "picked" commit.

Option 3:
You could also choose to work through every single commit using git rebase resolving conflicts for each commit in the sequence. It will work fine, but it requires a lot more manual steps.

I am very happy to do Option 1 for you, but I thought you might like to consider Option 2. I wouldn't bother with the third option, but you are welcome to if you like. Let me know if you'd like me to take care of option 1, but if you'd like the opportunity to try out the other options for yourself, that's fine also.

In terms of why git rebase (without the - i) is getting complicated, I think it's because some of the commits in the history seem to have been created twice for some reason, so old conflicts are resurfacing again in later commits. I'm not entirely sure how that eventuated, to be honest. But it has the effect of needing every commit on the rebase to be manually updated for conflicts, rather than it just occuring on the relevant commit in the history.

@tennlee
Copy link
Collaborator

tennlee commented Dec 11, 2024

Also, if you haven't already since Sunday, make sure to update your environment with the new versions of black, mypy and pylint.

@lbluett lbluett changed the title Implemented spearman's correlationship Implemented spearman's correlation Dec 11, 2024
@tennlee
Copy link
Collaborator

tennlee commented Dec 11, 2024

@lbluett let me know if you'd like to make a time to catch up virtually, and we can go through it together on a screen share

@lbluett
Copy link
Contributor Author

lbluett commented Dec 11, 2024

@lbluett let me know if you'd like to make a time to catch up virtually, and we can go through it together on a screen share

I've sent you a request on Discord!

Copy link
Collaborator

@nicholasloveday nicholasloveday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request. I have left some minor feedback.

The major change that I'd like to see is for the tutorial to be more focused on the Spearman's rank correlation coefficient and less of a copy and paste of the Pearson's tutorial



def spearmanr(
fcst: xr.DataArray,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy enough to extend this to also work with xr.Datasets?

@@ -82,7 +95,7 @@
(DA4_CORR, DA5_CORR, "space", None, EXP_CORR_DIFF_SIZE),
],
)
def test_correlation(da1, da2, reduce_dims, preserve_dims, expected):
def test_pearson_correlation(da1, da2, reduce_dims, preserve_dims, expected):
"""
Tests continuous.correlation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tests continuous.correlation
Tests continuous.correlation.pearsnonr

@@ -0,0 +1,416 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tutorial is mostly a duplication of the Persons correlation tutorial.

I suggest setting this up to be more focused on the Spearmans correlation coef.

Rather than just using the same synthetic forecast and observation data, it would be far more useful if you generated synthetic time series with non-linear relationships, rather than linear relationships; and then constructed a story around that data.

You could then at the end say/show that pearsons == Spearmans when the relationship is linear.

@lbluett lbluett force-pushed the feature/spearman_rank branch from 9321a90 to 792218f Compare December 16, 2024 23:35
@tennlee
Copy link
Collaborator

tennlee commented Dec 17, 2024

@lbluett Just ignore the mypy issues for now. They are occurring on develop as well, so I will fix them there instead. They sneak in when the tool versions change but the code hasn't. For reasons I don't understand, it seems like they don't always get picked up on a local run of the tools - perhaps some kind of caching that's not obvious to me. Anyhow, it's not your problem to fix.

author Liam Bluett <[email protected]> 1732590092 +1000
committer Liam Bluett <[email protected]> 1734391465 +1000

Implemented spearman's correlationship

Modified notebook to remove noise and add an explanation and reference.

Add Spearman's to gallery

Change notebook metadata to use 'Python 3 (ipykernel)' and 'python3' rather than custom 'ml' kernel.

Testing for spearman implemented

Maintainer notes followed, notebook fixed... again

Modified notebook to remove noise and add an explanation and reference.

cleanup more

Add Spearman's to gallery

Testing for spearman implemented

Maintainer notes followed, notebook fixed... again

Notebook kernel changed for testing

Update src/scores/continuous/correlation/correlation_impl.py

add pyfunc for hyperlink

Co-authored-by: Stephanie Chong <[email protected]>
Signed-off-by: Liam Bluett <[email protected]>

reorder alphabetically
@lbluett lbluett force-pushed the feature/spearman_rank branch from 792218f to c12e6d4 Compare December 17, 2024 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: Spearman rank correlation
4 participants