(still under development) add NSE function #217

engrmahadi · 2024-04-23T08:26:16Z

A new score or metric should be developed on a separate feature branch, rebased against the main branch. Each merge request should include:

The implementation of the new metric or score in xarray, ideally with support for pandas and dask
100% unit test coverage
A tutorial notebook showcasing the use of that metric or score, ideally based on the standard sample data
API documentation (docstrings) using Napoleon (google) style, making sure to clearly explain the use of the metrics
A reference to the paper which described the metrics, added to the API documentation
For metrics which do not have a paper reference, an online source or reference should be provided
For metrics which are still under development or which have not yet had an academic publication, they will be placed in a holding area within the API until the method has been properly published and peer reviewed (i.e. scores.emerging). The 'emerging' area of the API is subject to rapid change, still of sufficient community interest to include, similar to a 'preprint' of a score or metric.
Add your score to summary_table_of_scores.md in the documentation

All merge requests should comply with the coding standards outlined in this document. Merge requests will undergo both a code review and a science review. The code review will focus on coding style, performance and test coverage. The science review will focus on the mathematical correctness of the implementation and the suitability of the method for inclusion within 'scores'.

A github ticket should be created explaining the metric which is being implemented and why it is useful.

added NSE function

nicholasloveday

Thanks for making a start on this @engrmahadi

A few general comments:

Can you please update it so that it works with n-dimensional arrays and appropriately preserves dimensions.
Can you please add some unit tests. This will make it easier for you to check that it is working.
Can you delete the .DS_Store file. consider adding it to the .gitignore
Support for lists as inputs hasn't been done anywhere else in scores. Do you have a need for it. If so we should discuss whether scores supports lists or if we should just let the users convert their lists to xarray or pandas objects (which is what I would prefer) (@tennlee).

Please reach out if you want any help, particularly with setting up the unit tests.

src/scores/continuous/standard_impl.py

nicholasloveday · 2024-04-23T22:50:27Z

src/scores/continuous/standard_impl.py

+    return fcst
+
+
+def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False):


Can you please add type hints?

Also add a "*" after obs, to force the keyword arguments to be keyword-only.

nicholasloveday · 2024-04-23T22:51:13Z

src/scores/continuous/standard_impl.py

+    # if not isinstance(obs, xr.DataArray):
+    #    obs = xr.DataArray(obs)


Delete commented out code

nicholasloveday · 2024-04-23T22:52:20Z

src/scores/continuous/standard_impl.py

+
+def lst_to_array(fcst):
+    # Convert lists to xarray DataArrays
+    if not isinstance(fcst, xr.DataArray):


This is problematic if the input is an xr.Dataset

nicholasloveday · 2024-04-23T22:52:59Z

src/scores/continuous/standard_impl.py

+    return fcst
+
+
+def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False):


Can you make all args (apart from fcst and obs) to be keyword args?

nicholasloveday · 2024-04-23T22:53:56Z

src/scores/continuous/standard_impl.py

+    return fcst
+
+
+def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False):


We are now using is_angular instead of angular in the other metrics. Can you update it to be is_angular?

nicholasloveday · 2024-04-23T22:58:07Z

src/scores/continuous/standard_impl.py

+# add NSE code
+
+
+def lst_to_array(fcst):


If this is run on obs data too, perhaps the arg name should be updated to something more general.

Can you also add a docstring and typehint?

We need to consider if we want to support lists as inputs to scores functions. Do you have a use case for this? If you don't have a use case for this, then I consider dropping support for lists and allow the user to convert their data to xarray or pandas. @tennlee do you have an opinion on scores supporting lists as inputs?

nicholasloveday · 2024-04-23T23:00:25Z

src/scores/continuous/standard_impl.py

+        reduce_dims (FlexibleDimensionTypes, optional): Dimensions to reduce along. Defaults to None.
+        preserve_dims (FlexibleDimensionTypes, optional): Dimensions to preserve. Defaults to None.
+        weights (xr.DataArray, optional): Weights to apply to error calculation. Defaults to None.
+        angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.


Suggested change

angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.

is_angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.

nicholasloveday · 2024-04-23T23:08:12Z

src/scores/continuous/standard_impl.py

+        mean_obs = np.mean(obs)
+        diff_mean = obs - mean_obs
+    # calculate nse
+    nse = 1 - np.sum((error) ** 2) / np.sum((diff_mean) ** 2)


You are redefining the name from out of scope (this object has the same name as the function)

Also, this is not handling dimensions correctly. If you run this code trying to preserve a dimension, it will produce an error later on when scores.utils.gather_dimensions is called.

nicholasloveday · 2024-04-23T23:16:20Z

src/scores/continuous/standard_impl.py

+    # Apply weights
+    if weights is not None:
+        error = error * weights
+        diff_mean = diff_mean * weights


This weighting isn't doing anything as diff_mean isn't used anywhere

tennlee · 2024-04-24T02:07:39Z

I'm sure you're already aware of this, but obviously unit test coverage will be needed.

Minor release 0.8.1

engrmahadi · 2024-05-17T10:20:08Z

All checks have passed . Please check @nicholasloveday @tennlee .
Thank you.

tennlee · 2024-05-25T11:36:13Z

Thanks for the update Mahadi. I will try and review the changes this week - I have not been able to get to it before now. I just wanted to acknowledge your update and let you know that I will come and look through this soon.

nicholasloveday

I've had a look through this again Hasan. Thanks for some of those updates. Once you've addressed all this feedback, I'll review the tutorial notebook.

Please reach out to me if you have any questions so that we can get this pull request over the line.

nicholasloveday · 2024-05-30T07:31:09Z

.gitignore

@@ -108,3 +108,5 @@ dmypy.json
 # Cython debug symbols
 cython_debug/

+#ignore directory
+.DS_Store


Thanks for adding this. It looks like this pull request is still trying to add that file. Can you please remove the file?

nicholasloveday · 2024-05-30T07:32:11Z

docs/conf.py

@@ -9,7 +9,7 @@

 project = "scores"
 copyright = "Licensed under Apache 2.0 - https://www.apache.org/licenses/LICENSE-2.0"
-release = "0.9"
+release = "0.8.1"


I don't think this change is needed.

nicholasloveday · 2024-05-30T07:32:27Z

src/scores/__init__.py

@@ -13,7 +13,7 @@
 import scores.sample_data
 import scores.stats.statistical_tests  # noqa: F401

-__version__ = "0.9"
+__version__ = "0.8.1"


Remove this change.

nicholasloveday · 2024-05-30T07:59:27Z

src/scores/continuous/standard_impl.py

+        fcst (FlexibleArrayType or list): Forecast or predicted variables.
+        obs (FlexibleArrayType or list): Observed variables.
+        reduce_dims (FlexibleDimensionTypes, optional): Dimensions to reduce along. Defaults to None.
+        preserve_dims (FlexibleDimensionTypes, optional): Dimensions to preserve. Defaults to None.
+        weights (xr.DataArray, optional): Weights to apply to error calculation. Defaults to None.
+        angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.


If you add in type hints, you can remove the types from the docstring.

nicholasloveday · 2024-05-30T08:02:58Z

src/scores/continuous/standard_impl.py

+    if isinstance(fcst, xr.Dataset):
+        data_variable_name = list(fcst.data_vars.keys())[0]  # Get the name of the first data variable
+        fcst = fcst.to_array(dim=data_variable_name)


I don't think that we want to convert xr.Datasets to xr.DataArrays. I think this will only take out the first data_var anyway.

The code needs to be updated to work on Datasets with all data_vars

I ran across a similar thing in another area of scores, whereby the calculation was really only valid on a DataArray. In that case, it would be possible to split out each variable from a DataSet and then do the calculation on each variable, then reconstruct the results. Is the same kind of thing going on here?

nicholasloveday · 2024-05-30T08:07:35Z

tests/continuous/test_nse.py

+    fcst_data = np.array([[3, 4, 5, 6, 7], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7]])
+    obs_data = np.array([[2, 3, 4, 5, 6], [2, 3, 4, 5, 6], [2, 3, 4, 5, 6]])


Please add some NaNs in to ensure the function correctly handles missing data

nicholasloveday · 2024-05-30T08:09:14Z

tests/continuous/test_nse.py

@@ -0,0 +1,71 @@
+import numpy as np


Can you also add tests that check that the behaviour is correct, if the denominator is zero?

Can you add tests that check that the preserve_dims and reduce_dims work as expected?

nicholasloveday · 2024-05-30T08:12:30Z

tests/continuous/test_nse.py

+    obs = np.array([2, 3, 4, 5, 6])
+    weights = np.array([1, 2, 3, 2, 1])
+    nse_value = nse(fcst, obs, weights=weights)
+    assert nse_value == 0.5


Just making a note to check this output since the weights aren't doing anything in the current implementation of this function. I wouldn't expect the value to be 0.5, but I'd need to work it out with pen and paper.

nicholasloveday · 2024-05-30T08:13:05Z

src/scores/continuous/standard_impl.py

+        ValueError: If the input arrays are of different lengths or incompatible types.
+
+    References:
+        - references


Suggested change

- references

nicholasloveday · 2024-05-30T08:16:28Z

src/scores/continuous/standard_impl.py

+    References:
+        - references
+        - https://en.wikipedia.org/wiki/Nash–Sutcliffe_model_efficiency_coefficient
+        - https://hess.copernicus.org/articles/26/4801/2022/


This paper doesn't really focus on NSE - it uses it in an applied sense. I suggest that you remove this reference and use the original paper
Nash, J.E. and Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), pp.282-290.

nikeethr · 2025-01-24T02:51:57Z

@engrmahadi @nicholasloveday @tennlee @rob-taggart

This branch is very stale. If you're happy with me finishing up this issue options are:

try to sync up with develop & address comments in this commit OR
if simple enough: cherry pick tutorial, tests & references from this PR; close this PR; re-implement sticking to the paradigm of how MSE is implemented on scores or even utilizing MSE on a new PR. (my preference)

Miscellaneous notes

There are alternative/modified NSE metrics: https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient

NNSE - useful for machine learning, as NSE does not have a lower bound
NSE1 - l1 norm - reduces sensitivity to extreme values
LNSE - logarithmic transform prior to NSE - increases weightage of smaller obs

All of these alternatives seem relatively trivial to implement and may be useful, if we want to provide alternatives for certain conditions.

Related:
KGE (Kling–Gupta efficiency) - widely regarded improvement on NSE (according to wiki anyway). Can't remember if this is already in scores.

add NSE function

47c779d

engrmahadi requested a review from nicholasloveday April 23, 2024 10:36

engrmahadi added 2 commits April 23, 2024 20:45

Apply black formatting

93cb092

Apply black formatting

6d057ff

nicholasloveday requested changes Apr 23, 2024

View reviewed changes

tennlee changed the title ~~add NSE function~~ (still under development) add NSE function May 14, 2024

tennlee added this to the Wishlist milestone May 14, 2024

tennlee and others added 12 commits May 16, 2024 17:45

Set up versioning for release 0.8.1

76c00d9

Merge branch 'develop' into minor_release_8_1

49a57ed

Merge pull request #393 from nci/minor_release_8_1

efb7a8f

Minor release 0.8.1

NSE Tutorial and pytest file

03313cf

NSE Tutorial file

4cca0ca

removed list and pandasseries updated tutorial tests files

7465441

Merge branch 'main' of https://github.com/nci/scores into nash_nse

59ddc13

update tests files

cda5290

update score function as failed for angular_difference files

e5a1427

run black on all the files

951c8aa

run black on functions file

7517a68

run isort on files

afee19a

nicholasloveday requested changes May 30, 2024

View reviewed changes

addressed comments Oct 2024

5f692c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(still under development) add NSE function #217

(still under development) add NSE function #217

engrmahadi commented Apr 23, 2024

nicholasloveday left a comment

nicholasloveday Apr 23, 2024

tennlee Apr 24, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

nicholasloveday Apr 23, 2024

tennlee commented Apr 24, 2024

engrmahadi commented May 17, 2024

tennlee commented May 25, 2024

nicholasloveday left a comment

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

tennlee May 31, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nicholasloveday May 30, 2024

nikeethr commented Jan 24, 2025 •

edited

Loading

		return fcst


		def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False):

		# if not isinstance(obs, xr.DataArray):
		# obs = xr.DataArray(obs)

	angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.
	is_angular (bool, optional): Whether to treat data as angular (circular). Defaults to False.

		fcst_data = np.array([[3, 4, 5, 6, 7], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7]])
		obs_data = np.array([[2, 3, 4, 5, 6], [2, 3, 4, 5, 6], [2, 3, 4, 5, 6]])

(still under development) add NSE function #217

Are you sure you want to change the base?

(still under development) add NSE function #217

Conversation

engrmahadi commented Apr 23, 2024

nicholasloveday left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tennlee commented Apr 24, 2024

engrmahadi commented May 17, 2024

tennlee commented May 25, 2024

nicholasloveday left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikeethr commented Jan 24, 2025 • edited Loading

nikeethr commented Jan 24, 2025 •

edited

Loading