Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add seasonal statistics file to copernicus module #156

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

DahyannAraya
Copy link
Collaborator

@DahyannAraya DahyannAraya commented Jan 28, 2025

This section is part of the ongoing development of the "Copernicus Forecast Module" (see PR #142).

Changes introduced in this PR:
• Added seasonal_statistics.py to the climada_petals.hazard.copernicus_interface module. It processes climate data from NetCDF/GRIB files, computes daily and monthly statistics, and generates ensemble statistics. The module assigns forecast periods, aggregates data, and derives statistical insights for climate risk assessments. Functions include heat index calculations, time grouping, and statistical analysis.

• Added test_seasonal_statistics.py to validate seasonal forecast indices, ensuring accurate calculations for Tmean, Tmin, Tmax, and ensemble statistics. It includes tests for monthly datasets, error handling, and robustness, specifically testing calculate_heat_indices_metrics, calculate_statistics_from_index, calculate_monthly_dataset, and monthly_periods_from_valid_times.

Thank you, Luca. Please review seasonal_statistics.py and test_seasonal_statistics.py. Let me know if you need more information.

PR Author Checklist

PR Reviewer Checklist

Copy link
Collaborator

@luseverin luseverin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Dahyann, I finished reviewing your PR, great work!

I have made a few comments, you can have a look at them separatly for more details. Here are my main points:

  • The code and its structure looks good to me overall. However, I think it would benefit from some refactoring and more details in the documentation to improve its user-friednliness.
  • For instance, I really like the class index_definitions, but I feel like it would be beneficial to use it more extensively in the other modules e.g. consistently use the variable names that are defined there throughout your code, or use the fact that you defined there the variables required for the computation of each indices in the calcule_heat_indices_metrics function
  • Many lines are not covered by tests (see Jenkins). I am not an expert in testing but maybe you could check if the tests you wrote are sufficient or not. Apart from this the tests I saw looked good to me.
  • I was also a bit confused about the fact that for most functions of the heat_index module, you require an input in Kelvin, but you immediatly convert it to Celsius and eventually returns the value to Celsius. Maybe you could clarify on that.
  • In the index_definitions there are also some lines that appear duplicated, so you might want to check into that.


---

File to calculate different seasonal forecast indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative wording: "File to calculate different indices from seasonal forecasts"



def kelvin_to_fahrenheit(kelvin):
return (kelvin - 273.15) * 9 / 5 + 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not covered by tests. One could also think of checking that the input type is numeric and that the range is within the Kelvin scale. But maybe it is not so necessary. It also depends on how much user friendly you want to be or not..



def fahrenheit_to_kelvin(fahrenheit):
return (fahrenheit - 32) * 5 / 9 + 273.15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the kelvin_to_fahrenheit function. The comment also applies to celsius_to_kelvin, kelvin_to_celsius.

float or array-like
Relative humidity as a percentage (0-100) or as a decimal value (0-1), depending on the `as_percentage` setting.
"""
t2c = t2k - 273.15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here why not use the functions that you defined just before i.e. kelvin_to_celsius?

Relative humidity as a percentage (0-100) or as a decimal value (0-1), depending on the `as_percentage` setting.
"""
t2c = t2k - 273.15
tdc = tdk - 273.15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure I understand why you specifically ask for an input value in Kelvin if it is to convert it to Celsius..

with xr.open_dataset(input_file_name, engine=engine) as daily_ds:
# Handling various indices
if index_metric == "Tmean":
da_index = daily_ds["t2m_mean"] - 273.15 # Kelvin to Celsius
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here why not used the function defined in heat_index.py to convert Kelvin to Celsius?

Parameters
----------
input_file_name : str
Path to the input data file containing temperature and dewpoint information.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this it is not really clear to me what should the format of the input file (.csv, netcdf?), or what the input file should actually contain. Later I see that you also ask for e.g. windspeeds so I would suggest to write:
Path to the input data file containing the variables required for the computation of the desired index. Information on the required variables can be found in the index_definitions class.

daily_ds["t2m_mean"], daily_ds["d2m_mean"]
)
elif index_metric == "AT":
u10 = daily_ds.get("u10_max", daily_ds.get("10m_u_component_of_wind"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I see you only test if the required variables are present in the case of wind. Maybe you could make it more consistent by e.g. writing a function test_if_variables that would check if the required variables are in the input file, using the variables field in the index_definitions. Then you would only need to call this function once instead of doing a check for each index.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you could also collect the units from the index_definitions and add it as an attribute of your array once at the end instead of repeating it in each condition.

monthly = da_index.groupby("forecast_month").sum(dim="step")
else:
raise ValueError(
f"Unknown method {method} to compute monthly data. Please use 'mean' or 'sum'."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be written "count" instead of "sum"

label = f"ensemble_p{int(level * 100)}"
ds_stats[label] = ensemble_percentiles.sel(quantile=level)

return ds_stats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder here if it could be helpful compute everything by default but still let the user define what statistics they want to compute. It might be worth if some calculations are costly (e.g. the quantiles). However it might make the functions a bit less compact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants