-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEEPS #809
base: develop
Are you sure you want to change the base?
SEEPS #809
Changes from all commits
caaaa29
f99d0b3
fac6dbe
e59c7c0
9ca5f2e
4422bbc
ff3e92f
f968e42
3317a67
797da11
4b8753a
b6c590e
3c15c9d
d46b3a3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,12 +3,13 @@ | |
""" | ||
|
||
from collections.abc import Sequence | ||
from typing import Iterable, Optional, Union | ||
from typing import Optional, Union | ||
|
||
import numpy as np | ||
import xarray as xr | ||
|
||
from scores.functions import apply_weights | ||
from scores.processing import broadcast_and_match_nan | ||
from scores.typing import FlexibleDimensionTypes | ||
from scores.utils import check_dims, gather_dimensions | ||
|
||
|
@@ -132,7 +133,7 @@ def firm( # pylint: disable=too-many-arguments | |
|
||
def _check_firm_inputs( | ||
obs, risk_parameter, categorical_thresholds, threshold_weights, discount_distance, threshold_assignment | ||
): # pylint: disable=too-many-positional-arguments | ||
): | ||
""" | ||
Checks that the FIRM inputs are suitable | ||
""" | ||
|
@@ -244,3 +245,176 @@ def _single_category_score( | |
) | ||
score = score.transpose(*fcst.dims) | ||
return score | ||
|
||
|
||
def seeps( # pylint: disable=too-many-arguments, too-many-locals | ||
fcst: xr.DataArray, | ||
obs: xr.DataArray, | ||
p1: xr.DataArray, | ||
p3: xr.DataArray, | ||
light_heavy_threshold: xr.DataArray, | ||
*, # Force keywords arguments to be keyword-only | ||
dry_light_threshold: Optional[float] = 0.2, | ||
mask_clim_extremes: Optional[bool] = True, | ||
min_masked_value: Optional[float] = 0.1, | ||
max_masked_value: Optional[float] = 0.85, | ||
reduce_dims: Optional[FlexibleDimensionTypes] = None, | ||
preserve_dims: Optional[FlexibleDimensionTypes] = None, | ||
weights: Optional[xr.DataArray] = None, | ||
) -> xr.DataArray: | ||
r""" | ||
Calculates the stable equitable error in probability space (SEEPS) score. | ||
|
||
When used to evaluate precipitation forecasts, the SEEPS score calculates the | ||
performance of a forecast across three categories: | ||
|
||
- Dry weather (e.g., less than or equal to 0.2mm), | ||
- Light precipitation (the climatological lower two-thirds of rainfall above | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this category is defined using conditional probabilities, so it maybe better to use that kind of language e.g. the climatological lower two thirds of rainfall, conditioned on it raining Similarly for the heavy precipitation description |
||
the dry threshold), | ||
- Heavy precipitation (the climatological upper one-third of rainfall above | ||
the dry threshold). | ||
|
||
The SEEPS penalty matrix is defined as | ||
|
||
|
||
.. math:: | ||
s = \frac{1}{2} \left( | ||
\begin{matrix} | ||
0 & \frac{1}{1-p_1} & \frac{1}{p_3}+\frac{1}{1-p_1} \\ | ||
\frac{1}{p_1} & 0 & \frac{1}{p_3} \\ | ||
\frac{1}{p_1}+\frac{1}{1-p_3} & \frac{1}{1-p_3} & 0 | ||
\end{matrix} | ||
\right) | ||
|
||
|
||
where | ||
- :math:`p_1` is the climatological probability of the dry weather category | ||
- :math:`p_3` is the climatological probability of the heavy precipitation category. | ||
- The rows correspond to the forecast category (dry, light, heavy). | ||
- The columns correspond to the observation category (dry, light, heavy). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider adding a reference to the formula for the penalty matrix. The reference would be Rodwell et. al. 2010, Eq. (15). |
||
Note that although :math:`p_2`, does not appear in the penalty matrix, it is defined as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that this paragraph is required. However, if you want to discuss p2, I would define it as the climatological probability of light rain. And then I would mention that p2 = 2 * p3. I'm not sure what p_1 + p_2 + p_3 = 1 adds to this, other than to confirm that the three categories are mutually exclusive and cover the entire outcome space. |
||
:math:`p_2 = 2p_3` with :math:`p_1 + p_2 + p_3 = 1` which means that the light | ||
precipitation category is twice as likely to occur climatologically as the | ||
heavy precipitation category. | ||
|
||
This implementation of the score is negatively oriented, meaning that lower scores are better. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it called a SEEPS skill score in the literature? I'm inclined to reserve the term "skill score" to the usual meaning (i.e, skill_score = 1 - mean_score_fcst / mean_score_ref) Consider rewording to: Sometimes in the literature, a positive oriented version of SEEPS is used, which is defined as 1 - SEEPS. |
||
Sometimes in the literature, a SEEPS skill score is used, which is defined as 1 - SEEPS. | ||
|
||
By default, the scores are only calculated for points where :math:`p_1 \in [0.1, 0.85]` | ||
as per Rodwell et al. (2010). This can be changed by setting ``mask_clim_extremes`` to ``False`` or | ||
by changing the ``min_masked_value`` and ``max_masked_value`` parameters. | ||
|
||
For further details on generating the p1 and p3 arrays, see Rodwell et al. (2010). | ||
|
||
Args: | ||
fcst: An array of real-valued forecasts. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider adding "(e.g. precipitation forecasts in mm)" for clarity |
||
obs: An array of real-valued observations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider adding "(e.g. precipitation observations in mm)" for clarity |
||
p1: The climatological probability of the dry weather category. | ||
p3: The climatological probability of the heavy precipitation category. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider removing, as discussed above |
||
light_heavy_threshold: An array of the rainfall thresholds that separates | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider: An array of the rainfall thresholds (e.g. in mm) that separates light and heavy precipitation. |
||
light and heavy precipitation. Light precipitation is inclusive of this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider: "The threshold itself is included in the light precipitation category" |
||
threshold. | ||
dry_light_threshold: The threshold that separates dry weather from light precipitation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider: The threshold (in mm) that separates dry weather from light precipitation. |
||
Defaults to 0.2. Dry weather is defined as less than or equal to this threshold. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering why it should default to 0.2. Is 0.2 mm commonly used in evaluation of global NWP in the literature? |
||
mask_clim_extremes: If True, mask out the climatological extremes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer a more explicit explanation. e.g. If True, does not calculates SEEPS score for cases where the |
||
min_masked_value: Points with climatolgical probabilities of dry weather | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how to think about the mask. Is the mask the area where results are computed, or the area where the results are hidden? If the latter, I'd prefer |
||
less than this value are masked. Defaults to 0.1. | ||
max_masked_value: Points with climatolgical probabilities of dry weather | ||
greater than this value are masked. Defaults to 0.85. | ||
reduce_dims: Optionally specify which dimensions to reduce when | ||
calculating the SEEPS score. All other dimensions will be preserved. As a | ||
special case, 'all' will allow all dimensions to be reduced. Only one | ||
of `reduce_dims` and `preserve_dims` can be supplied. The default behaviour | ||
if neither are supplied is to reduce all dims. | ||
preserve_dims: Optionally specify which dimensions to preserve | ||
when calculating SEEPS. All other dimensions will be reduced. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "when calculating the SEEPS score" (to be consisent with reduce_dims descriptor) |
||
As a special case, 'all' will allow all dimensions to be | ||
preserved. In this case, the result will be in the same | ||
shape/dimensionality as the forecast, and the errors will be | ||
the SEEPS score at each point (i.e. single-value comparison | ||
against observed), and the forecast and observed dimensions | ||
must match precisely. Only one of `reduce_dims` and `preserve_dims` can be | ||
supplied. The default behaviour if neither are supplied is to reduce all dims. | ||
weights: Optionally provide an array for weighted averaging (e.g. by area). | ||
|
||
Returns: | ||
An xarray DataArray containing the SEEPS score. | ||
|
||
Raises: | ||
ValueError: if any values in `p1` are outside the range [0, 1]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You really require |
||
ValueError: if any values in `p3` are outside the range [0, 1]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could be removed in p3 is removed |
||
|
||
References: | ||
Rodwell, M. J., Richardson, D. S., Hewson, T. D., & Haiden, T. (2010). | ||
A new equitable score suitable for verifying precipitation in numerical | ||
weather prediction. Quarterly Journal of the Royal Meteorological Society, | ||
136(650), 1344–1363. https://doi.org/10.1002/qj.656 | ||
|
||
Examples: | ||
>>> import numpy as np | ||
>>> import xarray as xr | ||
>>> from scores.categorical import seeps | ||
>>> fcst = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) | ||
>>> obs = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) | ||
>>> p1 = xr.DataArray(np.random.rand(6, 8), dims=['lat', 'lon']) | ||
>>> p3 = (1 - p1) / 3 | ||
>>> light_heavy_threshold = 2 * xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) | ||
>>> seeps(fcst, obs, p1, p3, light_heavy_threshold=light_heavy_threshold) | ||
""" | ||
if p1.min() < 0 or p1.max() > 1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. p1.min() <= 0 or p1.max() >= 1 |
||
raise ValueError("`p1` must have values between 0 and 1 inclusive") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. " |
||
if p3.min() < 0 or p3.max() > 1: | ||
raise ValueError("`p3` must have values between 0 and 1 inclusive") | ||
|
||
reduce_dims = gather_dimensions(fcst.dims, obs.dims, reduce_dims=reduce_dims, preserve_dims=preserve_dims) | ||
fcst, obs = broadcast_and_match_nan(fcst, obs) | ||
|
||
# Penalties for index i, j in the penalty matrix. Row i corresponds to the | ||
# forecast category while row j corresponds to the observation category | ||
# row 1 of the penalty matrix | ||
index_12 = 1 / (1 - p1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering whether |
||
index_13 = (1 / p3) + (1 / (1 - p1)) | ||
# row 2 of the penalty matrix | ||
index_21 = 1 / p1 | ||
index_23 = 1 / p3 | ||
# row 3 of the penalty matrix | ||
index_31 = (1 / p1) + (1 / (1 - p3)) | ||
index_32 = 1 / (1 - p3) | ||
|
||
# Get conditions for each category | ||
fcst1_condition = fcst <= dry_light_threshold | ||
fcst2_condition = (fcst > dry_light_threshold) & (fcst <= light_heavy_threshold) | ||
fcst3_condition = fcst > light_heavy_threshold | ||
|
||
obs1_condition = obs <= dry_light_threshold | ||
obs2_condition = (obs > dry_light_threshold) & (obs <= light_heavy_threshold) | ||
obs3_condition = obs > light_heavy_threshold | ||
|
||
# Calculate the penalties | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. below doesn't feel very pythonesque, but I'm sure I've done much worse ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Though it probably uses less memory than converting fcst and obs to a categories, determining a contingency tables and multiplying by the penalty matrix |
||
result = fcst.copy() * 0 | ||
result = result.where(~(fcst1_condition & obs2_condition), index_12.broadcast_like(result)) | ||
result = result.where(~(fcst1_condition & obs3_condition), index_13.broadcast_like(result)) | ||
result = result.where(~(fcst2_condition & obs1_condition), index_21.broadcast_like(result)) | ||
result = result.where(~(fcst2_condition & obs3_condition), index_23.broadcast_like(result)) | ||
result = result.where(~(fcst3_condition & obs1_condition), index_31.broadcast_like(result)) | ||
result = result.where(~(fcst3_condition & obs2_condition), index_32.broadcast_like(result)) | ||
|
||
result = result / 2 | ||
# return NaNs | ||
result = result.where( | ||
~np.isnan(fcst) | ||
& ~np.isnan(obs) | ||
& ~np.isnan(p1) | ||
& ~np.isnan(p3) | ||
& ~np.isnan(light_heavy_threshold) | ||
& ~np.isnan(dry_light_threshold) | ||
) | ||
|
||
if mask_clim_extremes: | ||
result = result.where(np.logical_and(p1 <= max_masked_value, p1 >= min_masked_value)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I see here that the mask is the area where you want to calculate values |
||
|
||
result = apply_weights(result, weights=weights) | ||
result = result.mean(dim=reduce_dims) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment about this implementation. It looks like you are fixing the ratio of the climatological probability of light rain to the climatological probability of heavy rain to be 2. I'm fairly sure that is the choice made by ECMWF in their use of SEEPS.
But with this choice if p1 is given then p2 and p3 are completely determined using
p1+p2+p3=1
andp2=2 * p3
. So you don't need to havep3
as an argument ifp1
is already an argument.I'd recommend one of the following: either
p3
as an argumentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you select option 1, you have
p3 = (1 - p1) / 3