Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEEPS #809

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Below is a **curated selection** of the metrics, tools and statistical tests inc
|----------------------- |----------------- |-------------- |
| **[Continuous](https://scores.readthedocs.io/en/stable/included.html#continuous)** |Scores for evaluating single-valued continuous forecasts. |MAE, MSE, RMSE, Additive Bias, Multiplicative Bias, Percent Bias, Pearson's Correlation Coefficient, Kling-Gupta Efficiency, Flip-Flop Index, Quantile Loss, Quantile Interval Score, Interval Score, Murphy Score, and threshold weighted scores for expectiles, quantiles and Huber Loss. |
| **[Probability](https://scores.readthedocs.io/en/stable/included.html#probability)** |Scores for evaluating forecasts that are expressed as predictive distributions, ensembles, and probabilities of binary events. |Brier Score, Continuous Ranked Probability Score (CRPS) for Cumulative Density Functions (CDF) and ensembles (including threshold weighted versions), Receiver Operating Characteristic (ROC), Isotonic Regression (reliability diagrams). |
| **[Categorical](https://scores.readthedocs.io/en/stable/included.html#categorical)** |Scores for evaluating forecasts of categories. |18 binary contingency table (confusion matrix) metrics and the FIxed Risk Multicategorical (FIRM) Score. |
| **[Categorical](https://scores.readthedocs.io/en/stable/included.html#categorical)** |Scores for evaluating forecasts of categories. |18 binary contingency table (confusion matrix) metrics, the FIxed Risk Multicategorical (FIRM) Score, and the SEEPS score. |
| **[Spatial](https://scores.readthedocs.io/en/stable/included.html#spatial)** |Scores that take into account spatial structure. |Fractions Skill Score. |
| **[Statistical Tests](https://scores.readthedocs.io/en/stable/included.html#statistical-tests)** |Tools to conduct statistical tests and generate confidence intervals. |Diebold Mariano. |
| **[Processing Tools](https://scores.readthedocs.io/en/stable/included.html#processing-tools-for-preparing-data)** |Tools to pre-process data. |Data matching, Discretisation, Cumulative Density Function Manipulation. |
Expand Down
1 change: 1 addition & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
:members:
.. autoclass:: scores.categorical.EventOperator
:members:
.. autofunction:: scores.categorical.seeps
```

## scores.spatial
Expand Down
4 changes: 4 additions & 0 deletions docs/included.md
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,10 @@
- [API](api.md#scores.categorical.probability_of_false_detection)
- [Tutorial](project:./tutorials/ROC.md)
- [Probability of false detection (WWRP/WGNE Joint Working Group on Forecast Verification Research)](https://www.cawcr.gov.au/projects/verification/#POFD)
* - Stable Equitable Error in Probability Space (SEEPS)
- [API](api.md#scores.categorical.seeps)
- [Tutorial](project:./tutorials/SEEPS.md)
- [Rodwell et al. (2010)](https://doi.org/10.1002/qj.656)
* - Threshold Event Operator
- [API](api.md#scores.categorical.ThresholdEventOperator)
- [Tutorial](project:./tutorials/Binary_Contingency_Scores.md)
Expand Down
3 changes: 2 additions & 1 deletion src/scores/categorical/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@
EventOperator,
ThresholdEventOperator,
)
from scores.categorical.multicategorical_impl import firm
from scores.categorical.multicategorical_impl import firm, seeps

__all__ = [
"probability_of_detection",
"probability_of_false_detection",
"firm",
"seeps",
"BasicContingencyManager",
"BinaryContingencyManager",
"ThresholdEventOperator",
Expand Down
178 changes: 176 additions & 2 deletions src/scores/categorical/multicategorical_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
"""

from collections.abc import Sequence
from typing import Iterable, Optional, Union
from typing import Optional, Union

import numpy as np
import xarray as xr

from scores.functions import apply_weights
from scores.processing import broadcast_and_match_nan
from scores.typing import FlexibleDimensionTypes
from scores.utils import check_dims, gather_dimensions

Expand Down Expand Up @@ -132,7 +133,7 @@ def firm( # pylint: disable=too-many-arguments

def _check_firm_inputs(
obs, risk_parameter, categorical_thresholds, threshold_weights, discount_distance, threshold_assignment
): # pylint: disable=too-many-positional-arguments
):
"""
Checks that the FIRM inputs are suitable
"""
Expand Down Expand Up @@ -244,3 +245,176 @@ def _single_category_score(
)
score = score.transpose(*fcst.dims)
return score


def seeps( # pylint: disable=too-many-arguments, too-many-locals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment about this implementation. It looks like you are fixing the ratio of the climatological probability of light rain to the climatological probability of heavy rain to be 2. I'm fairly sure that is the choice made by ECMWF in their use of SEEPS.

But with this choice if p1 is given then p2 and p3 are completely determined using p1+p2+p3=1 and p2=2 * p3. So you don't need to have p3 as an argument if p1 is already an argument.

I'd recommend one of the following: either

  1. remove p3 as an argument
  2. allow flexibility in the ratio of light to heavy climatological probabilities, in which case p3 could be specified or (my preference) the ratio could be specified (with a default value of 2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you select option 1, you have p3 = (1 - p1) / 3

fcst: xr.DataArray,
obs: xr.DataArray,
p1: xr.DataArray,
p3: xr.DataArray,
light_heavy_threshold: xr.DataArray,
*, # Force keywords arguments to be keyword-only
dry_light_threshold: Optional[float] = 0.2,
mask_clim_extremes: Optional[bool] = True,
min_masked_value: Optional[float] = 0.1,
max_masked_value: Optional[float] = 0.85,
reduce_dims: Optional[FlexibleDimensionTypes] = None,
preserve_dims: Optional[FlexibleDimensionTypes] = None,
weights: Optional[xr.DataArray] = None,
) -> xr.DataArray:
r"""
Calculates the stable equitable error in probability space (SEEPS) score.

When used to evaluate precipitation forecasts, the SEEPS score calculates the
performance of a forecast across three categories:

- Dry weather (e.g., less than or equal to 0.2mm),
- Light precipitation (the climatological lower two-thirds of rainfall above
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this category is defined using conditional probabilities, so it maybe better to use that kind of language

e.g. the climatological lower two thirds of rainfall, conditioned on it raining

Similarly for the heavy precipitation description

the dry threshold),
- Heavy precipitation (the climatological upper one-third of rainfall above
the dry threshold).

The SEEPS penalty matrix is defined as


.. math::
s = \frac{1}{2} \left(
\begin{matrix}
0 & \frac{1}{1-p_1} & \frac{1}{p_3}+\frac{1}{1-p_1} \\
\frac{1}{p_1} & 0 & \frac{1}{p_3} \\
\frac{1}{p_1}+\frac{1}{1-p_3} & \frac{1}{1-p_3} & 0
\end{matrix}
\right)


where
- :math:`p_1` is the climatological probability of the dry weather category
- :math:`p_3` is the climatological probability of the heavy precipitation category.
- The rows correspond to the forecast category (dry, light, heavy).
- The columns correspond to the observation category (dry, light, heavy).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a reference to the formula for the penalty matrix. The reference would be Rodwell et. al. 2010, Eq. (15).

Note that although :math:`p_2`, does not appear in the penalty matrix, it is defined as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this paragraph is required. However, if you want to discuss p2, I would define it as the climatological probability of light rain. And then I would mention that p2 = 2 * p3.

I'm not sure what p_1 + p_2 + p_3 = 1 adds to this, other than to confirm that the three categories are mutually exclusive and cover the entire outcome space.

:math:`p_2 = 2p_3` with :math:`p_1 + p_2 + p_3 = 1` which means that the light
precipitation category is twice as likely to occur climatologically as the
heavy precipitation category.

This implementation of the score is negatively oriented, meaning that lower scores are better.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it called a SEEPS skill score in the literature?

I'm inclined to reserve the term "skill score" to the usual meaning (i.e, skill_score = 1 - mean_score_fcst / mean_score_ref)
in the scores package where possible.

Consider rewording to: Sometimes in the literature, a positive oriented version of SEEPS is used, which is defined as 1 - SEEPS.

Sometimes in the literature, a SEEPS skill score is used, which is defined as 1 - SEEPS.

By default, the scores are only calculated for points where :math:`p_1 \in [0.1, 0.85]`
as per Rodwell et al. (2010). This can be changed by setting ``mask_clim_extremes`` to ``False`` or
by changing the ``min_masked_value`` and ``max_masked_value`` parameters.

For further details on generating the p1 and p3 arrays, see Rodwell et al. (2010).

Args:
fcst: An array of real-valued forecasts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding "(e.g. precipitation forecasts in mm)" for clarity

obs: An array of real-valued observations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding "(e.g. precipitation observations in mm)" for clarity

p1: The climatological probability of the dry weather category.
p3: The climatological probability of the heavy precipitation category.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing, as discussed above

light_heavy_threshold: An array of the rainfall thresholds that separates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: An array of the rainfall thresholds (e.g. in mm) that separates light and heavy precipitation.

light and heavy precipitation. Light precipitation is inclusive of this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: "The threshold itself is included in the light precipitation category"

threshold.
dry_light_threshold: The threshold that separates dry weather from light precipitation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: The threshold (in mm) that separates dry weather from light precipitation.

Defaults to 0.2. Dry weather is defined as less than or equal to this threshold.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why it should default to 0.2. Is 0.2 mm commonly used in evaluation of global NWP in the literature?

mask_clim_extremes: If True, mask out the climatological extremes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a more explicit explanation. e.g.

If True, does not calculates SEEPS score for cases where the p1 < min_masked_value or for cases where p1 > max_masked_value. NaN is returned instead.

min_masked_value: Points with climatolgical probabilities of dry weather
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to think about the mask. Is the mask the area where results are computed, or the area where the results are hidden?

If the latter, I'd prefer lower_mask_threshold and upper_mask_threshold instead of min_masked_value and max_masked_value. The reason is, if (say) min_masked_value=0.1 then values lower than 0.1 are hidden which means that 0.1 is a minimum mask value.

less than this value are masked. Defaults to 0.1.
max_masked_value: Points with climatolgical probabilities of dry weather
greater than this value are masked. Defaults to 0.85.
reduce_dims: Optionally specify which dimensions to reduce when
calculating the SEEPS score. All other dimensions will be preserved. As a
special case, 'all' will allow all dimensions to be reduced. Only one
of `reduce_dims` and `preserve_dims` can be supplied. The default behaviour
if neither are supplied is to reduce all dims.
preserve_dims: Optionally specify which dimensions to preserve
when calculating SEEPS. All other dimensions will be reduced.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when calculating the SEEPS score"

(to be consisent with reduce_dims descriptor)

As a special case, 'all' will allow all dimensions to be
preserved. In this case, the result will be in the same
shape/dimensionality as the forecast, and the errors will be
the SEEPS score at each point (i.e. single-value comparison
against observed), and the forecast and observed dimensions
must match precisely. Only one of `reduce_dims` and `preserve_dims` can be
supplied. The default behaviour if neither are supplied is to reduce all dims.
weights: Optionally provide an array for weighted averaging (e.g. by area).

Returns:
An xarray DataArray containing the SEEPS score.

Raises:
ValueError: if any values in `p1` are outside the range [0, 1].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really require 0 < p1 < 1 (strict inequalities) otherwise some entries in the SEEPS scoring matrix will be infinity.

ValueError: if any values in `p3` are outside the range [0, 1].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be removed in p3 is removed


References:
Rodwell, M. J., Richardson, D. S., Hewson, T. D., & Haiden, T. (2010).
A new equitable score suitable for verifying precipitation in numerical
weather prediction. Quarterly Journal of the Royal Meteorological Society,
136(650), 1344–1363. https://doi.org/10.1002/qj.656

Examples:
>>> import numpy as np
>>> import xarray as xr
>>> from scores.categorical import seeps
>>> fcst = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon'])
>>> obs = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon'])
>>> p1 = xr.DataArray(np.random.rand(6, 8), dims=['lat', 'lon'])
>>> p3 = (1 - p1) / 3
>>> light_heavy_threshold = 2 * xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon'])
>>> seeps(fcst, obs, p1, p3, light_heavy_threshold=light_heavy_threshold)
"""
if p1.min() < 0 or p1.max() > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p1.min() <= 0 or p1.max() >= 1

raise ValueError("`p1` must have values between 0 and 1 inclusive")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"p1 must have values strictly between 0 and 1"

if p3.min() < 0 or p3.max() > 1:
raise ValueError("`p3` must have values between 0 and 1 inclusive")

reduce_dims = gather_dimensions(fcst.dims, obs.dims, reduce_dims=reduce_dims, preserve_dims=preserve_dims)
fcst, obs = broadcast_and_match_nan(fcst, obs)

# Penalties for index i, j in the penalty matrix. Row i corresponds to the
# forecast category while row j corresponds to the observation category
# row 1 of the penalty matrix
index_12 = 1 / (1 - p1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether penalty_12 is a better label. Not requesting you change anything though - the code is readable as it is

index_13 = (1 / p3) + (1 / (1 - p1))
# row 2 of the penalty matrix
index_21 = 1 / p1
index_23 = 1 / p3
# row 3 of the penalty matrix
index_31 = (1 / p1) + (1 / (1 - p3))
index_32 = 1 / (1 - p3)

# Get conditions for each category
fcst1_condition = fcst <= dry_light_threshold
fcst2_condition = (fcst > dry_light_threshold) & (fcst <= light_heavy_threshold)
fcst3_condition = fcst > light_heavy_threshold

obs1_condition = obs <= dry_light_threshold
obs2_condition = (obs > dry_light_threshold) & (obs <= light_heavy_threshold)
obs3_condition = obs > light_heavy_threshold

# Calculate the penalties
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below doesn't feel very pythonesque, but I'm sure I've done much worse ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it probably uses less memory than converting fcst and obs to a categories, determining a contingency tables and multiplying by the penalty matrix

result = fcst.copy() * 0
result = result.where(~(fcst1_condition & obs2_condition), index_12.broadcast_like(result))
result = result.where(~(fcst1_condition & obs3_condition), index_13.broadcast_like(result))
result = result.where(~(fcst2_condition & obs1_condition), index_21.broadcast_like(result))
result = result.where(~(fcst2_condition & obs3_condition), index_23.broadcast_like(result))
result = result.where(~(fcst3_condition & obs1_condition), index_31.broadcast_like(result))
result = result.where(~(fcst3_condition & obs2_condition), index_32.broadcast_like(result))

result = result / 2
# return NaNs
result = result.where(
~np.isnan(fcst)
& ~np.isnan(obs)
& ~np.isnan(p1)
& ~np.isnan(p3)
& ~np.isnan(light_heavy_threshold)
& ~np.isnan(dry_light_threshold)
)

if mask_clim_extremes:
result = result.where(np.logical_and(p1 <= max_masked_value, p1 >= min_masked_value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see here that the mask is the area where you want to calculate values


result = apply_weights(result, weights=weights)
result = result.mean(dim=reduce_dims)

return result
Loading
Loading