-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(still under development) add NSE function #217
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making a start on this @engrmahadi
A few general comments:
- Can you please update it so that it works with n-dimensional arrays and appropriately preserves dimensions.
- Can you please add some unit tests. This will make it easier for you to check that it is working.
- Can you delete the
.DS_Store
file. consider adding it to the.gitignore
- Support for lists as inputs hasn't been done anywhere else in scores. Do you have a need for it. If so we should discuss whether scores supports lists or if we should just let the users convert their lists to xarray or pandas objects (which is what I would prefer) (@tennlee).
Please reach out if you want any help, particularly with setting up the unit tests.
return fcst | ||
|
||
|
||
def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add type hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a "*" after obs, to force the keyword arguments to be keyword-only.
# if not isinstance(obs, xr.DataArray): | ||
# obs = xr.DataArray(obs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete commented out code
|
||
def lst_to_array(fcst): | ||
# Convert lists to xarray DataArrays | ||
if not isinstance(fcst, xr.DataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic if the input is an xr.Dataset
return fcst | ||
|
||
|
||
def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make all args (apart from fcst
and obs
) to be keyword args?
return fcst | ||
|
||
|
||
def nse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None, angular=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now using is_angular
instead of angular
in the other metrics. Can you update it to be is_angular
?
# add NSE code | ||
|
||
|
||
def lst_to_array(fcst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is run on obs data too, perhaps the arg name should be updated to something more general.
Can you also add a docstring and typehint?
We need to consider if we want to support lists
as inputs to scores functions. Do you have a use case for this? If you don't have a use case for this, then I consider dropping support for lists and allow the user to convert their data to xarray or pandas. @tennlee do you have an opinion on scores supporting lists as inputs?
reduce_dims (FlexibleDimensionTypes, optional): Dimensions to reduce along. Defaults to None. | ||
preserve_dims (FlexibleDimensionTypes, optional): Dimensions to preserve. Defaults to None. | ||
weights (xr.DataArray, optional): Weights to apply to error calculation. Defaults to None. | ||
angular (bool, optional): Whether to treat data as angular (circular). Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
angular (bool, optional): Whether to treat data as angular (circular). Defaults to False. | |
is_angular (bool, optional): Whether to treat data as angular (circular). Defaults to False. |
mean_obs = np.mean(obs) | ||
diff_mean = obs - mean_obs | ||
# calculate nse | ||
nse = 1 - np.sum((error) ** 2) / np.sum((diff_mean) ** 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are redefining the name from out of scope (this object has the same name as the function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is not handling dimensions correctly. If you run this code trying to preserve a dimension, it will produce an error later on when scores.utils.gather_dimensions
is called.
# Apply weights | ||
if weights is not None: | ||
error = error * weights | ||
diff_mean = diff_mean * weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This weighting isn't doing anything as diff_mean
isn't used anywhere
I'm sure you're already aware of this, but obviously unit test coverage will be needed. |
Minor release 0.8.1
All checks have passed . Please check @nicholasloveday @tennlee . |
Thanks for the update Mahadi. I will try and review the changes this week - I have not been able to get to it before now. I just wanted to acknowledge your update and let you know that I will come and look through this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a look through this again Hasan. Thanks for some of those updates. Once you've addressed all this feedback, I'll review the tutorial notebook.
Please reach out to me if you have any questions so that we can get this pull request over the line.
@@ -108,3 +108,5 @@ dmypy.json | |||
# Cython debug symbols | |||
cython_debug/ | |||
|
|||
#ignore directory | |||
.DS_Store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. It looks like this pull request is still trying to add that file. Can you please remove the file?
docs/conf.py
Outdated
@@ -9,7 +9,7 @@ | |||
|
|||
project = "scores" | |||
copyright = "Licensed under Apache 2.0 - https://www.apache.org/licenses/LICENSE-2.0" | |||
release = "0.9" | |||
release = "0.8.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is needed.
src/scores/__init__.py
Outdated
@@ -13,7 +13,7 @@ | |||
import scores.sample_data | |||
import scores.stats.statistical_tests # noqa: F401 | |||
|
|||
__version__ = "0.9" | |||
__version__ = "0.8.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this change.
fcst (FlexibleArrayType or list): Forecast or predicted variables. | ||
obs (FlexibleArrayType or list): Observed variables. | ||
reduce_dims (FlexibleDimensionTypes, optional): Dimensions to reduce along. Defaults to None. | ||
preserve_dims (FlexibleDimensionTypes, optional): Dimensions to preserve. Defaults to None. | ||
weights (xr.DataArray, optional): Weights to apply to error calculation. Defaults to None. | ||
angular (bool, optional): Whether to treat data as angular (circular). Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add in type hints, you can remove the types from the docstring.
if isinstance(fcst, xr.Dataset): | ||
data_variable_name = list(fcst.data_vars.keys())[0] # Get the name of the first data variable | ||
fcst = fcst.to_array(dim=data_variable_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we want to convert xr.Datasets to xr.DataArrays. I think this will only take out the first data_var anyway.
The code needs to be updated to work on Datasets with all data_vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran across a similar thing in another area of scores, whereby the calculation was really only valid on a DataArray. In that case, it would be possible to split out each variable from a DataSet and then do the calculation on each variable, then reconstruct the results. Is the same kind of thing going on here?
fcst_data = np.array([[3, 4, 5, 6, 7], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7]]) | ||
obs_data = np.array([[2, 3, 4, 5, 6], [2, 3, 4, 5, 6], [2, 3, 4, 5, 6]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some NaNs in to ensure the function correctly handles missing data
@@ -0,0 +1,71 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests that check that the behaviour is correct, if the denominator is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests that check that the preserve_dims
and reduce_dims
work as expected?
obs = np.array([2, 3, 4, 5, 6]) | ||
weights = np.array([1, 2, 3, 2, 1]) | ||
nse_value = nse(fcst, obs, weights=weights) | ||
assert nse_value == 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making a note to check this output since the weights aren't doing anything in the current implementation of this function. I wouldn't expect the value to be 0.5, but I'd need to work it out with pen and paper.
ValueError: If the input arrays are of different lengths or incompatible types. | ||
|
||
References: | ||
- references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- references |
References: | ||
- references | ||
- https://en.wikipedia.org/wiki/Nash–Sutcliffe_model_efficiency_coefficient | ||
- https://hess.copernicus.org/articles/26/4801/2022/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paper doesn't really focus on NSE - it uses it in an applied sense. I suggest that you remove this reference and use the original paper
Nash, J.E. and Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), pp.282-290.
@engrmahadi @nicholasloveday @tennlee @rob-taggart This branch is very stale. If you're happy with me finishing up this issue options are:
Miscellaneous notes There are alternative/modified NSE metrics: https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient NNSE - useful for machine learning, as NSE does not have a lower bound All of these alternatives seem relatively trivial to implement and may be useful, if we want to provide alternatives for certain conditions. Related: |
A new score or metric should be developed on a separate feature branch, rebased against the main branch. Each merge request should include:
All merge requests should comply with the coding standards outlined in this document. Merge requests will undergo both a code review and a science review. The code review will focus on coding style, performance and test coverage. The science review will focus on the mathematical correctness of the implementation and the suitability of the method for inclusion within 'scores'.
A github ticket should be created explaining the metric which is being implemented and why it is useful.
added NSE function