-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of Static and Forcing Fields #34
Comments
@leifdenby As we discussed in the meeting: the boundary mask certainly has high importance as it is not a simple static feature but potentially used in loss-calculation and graph-design. As you can see from the plots, I only got so far as to create a very basic working example for MEPS+ERA5 as a basis for discussion and future PRs. It would be great to get your boundary masking expertise here 🤝 |
|
For many static (climatological) surface parameters I guess ECOCLIMAP-II would work: https://opensource.umr-cnrm.fr/projects/ecoclimap/wiki (1 km resolution) |
Regarding TOA irradiance. What about something like this: def toa_solirr(lat, lon, day, hr_utc):
dec = -23.45 * np.cos(np.pi/180*360/365*(day-81)) * np.pi/180
hr_lst = hr_utc + lon/15
hr_angle = 15 * (hr_lst - 12)
return np.sin(lat*np.pi/180) * np.sin(dec) + np.cos(lat*np.pi/180) * np.cos(dec) * np.cos(hr_angle*np.pi/180) Is that also too slow?
|
This looks great I think! How general is this (e.g. how hard to use it for another domain, another projection)? Can you share the full code for this? What is the unit for the distances now? meters?
Since it is entirely parallelizable that should be quite fast. I tested a parallel version with broadcasting: import numpy as np
import xarray as xr
import time
zarr_path = "data/era5_latlon/ll.zarr"
ll_xds = xr.open_zarr(zarr_path)
lon_1d = ll_xds.longitude.to_numpy() # (num_lon,)
lat_1d = ll_xds.latitude.to_numpy() # (num_lat,)
# dim order is (lat, lon, day, hr)
def toa_solirr(lat, lon, day, hr_utc):
lat = lat[:, np.newaxis, np.newaxis, np.newaxis]
lon = lon[np.newaxis, :, np.newaxis, np.newaxis]
day = day[np.newaxis, np.newaxis, :, np.newaxis]
hr_utc = hr_utc[np.newaxis, np.newaxis, np.newaxis, :]
dec = -23.45 * np.cos(np.pi / 180 * 360 / 365 * (day - 81)) * np.pi / 180
hr_lst = hr_utc + lon / 15
hr_angle = 15 * (hr_lst - 12)
return np.sin(lat * np.pi / 180) * np.sin(dec) + np.cos(
lat * np.pi / 180
) * np.cos(dec) * np.cos(hr_angle * np.pi / 180)
day = np.arange(7)
hour = np.arange(24)
start_time = time.time()
toa = toa_solirr(lat_1d, lon_1d, day, hour)
end_time = time.time()
print(f"Took: {end_time-start_time} s") (quick test code, but does the trick) |
The difference to GC implementation is that they focus on having something very similar to what goes into ERA5 while mine opts for as simple as possible. Maybe too simple? It does for example not care about the Sun-Earth distance that causes ca 7 % variation during the year. Still, I think other things contribute more to the error than imperfections in this forcing. However, I've not checked the output from my version with more precise ones - should be done. |
Here is an update with a correction to the declination. It now outputs irradiance [Wm**-2] so that it can be checked more easily. Also input is given by DataArray coordinates. def toa_solirr(dt, lon, lat):
# same dim order as in xda
lon = lon.values[np.newaxis, :, np.newaxis]
lat = lat.values[np.newaxis, np.newaxis, :]
day = dt.dayofyear.values[:, np.newaxis, np.newaxis]
hr_utc = dt.hour.values[:, np.newaxis, np.newaxis]
dec = np.pi/180 * 23.45 * np.sin(2*np.pi * (284+day) / 365)
hr_lst = hr_utc + lon / 15
hr_angle = 15 * (hr_lst - 12)
cos_sza = np.sin(lat * np.pi / 180) * np.sin(dec) + np.cos(
lat * np.pi / 180
) * np.cos(dec) * np.cos(hr_angle * np.pi / 180)
return np.fmax(0, 1366 * cos_sza)
toa = toa_solirr(xda['time'].dt, xda['longitude'], xda['latitude']) I tried to get it to work as a _ufunc like this but to no avail. Maybe someone familiar with Dask can make it work?
|
The datetime forcings are now implemented in #54 without the need for a |
Closing this as superseded by mllam/mllam-data-prep#29 |
Summary
For international collaboration a selection of state variables, as well as static and forcing features were selected (see here for reference: Sheet). While the variables are provided by the user in a zarr-archive (usually NWP-model output), the additional forcing and static features are often not available. I suggest to provide the user with the option to generate these fields using a new script create_forcings.py. The script works for state and boundary domains alike, simple visualizations were provided to make this issue easier to understand. Please feel free to improve on any part of the script and help writing PRs 🚀
New fields
Questions / Issues:
Function Details and Rationale
calculate_datetime_forcing(ds, args)
This function calculates the datetime forcing for the neural LAM model by generating sinusoidal representations of the hour of the day and the time of the year. These features help the model understand diurnal and seasonal variations. Previously these were part of the PyTorch Dataset, provided in
__get_item__
; it certainly makes more sense to calculate these in advance, should be more efficient and allows for cleaner Dataset code.generate_land_sea_mask(xy_state, tempdir, projection, high_res_factor=10)
This function generates a land-sea mask for the neural LAM model. The shapefile for the land-sea mask comes from natural-earth at 50m resolution. Percentage of gridcell covered by land is calculated by rasterization of high-res shapefile. This generates static fields, for some regions this mask will vary in time as parts of the sea freeze.
create_boundary_mask(lonlat_state, lonlat_boundary, boundary_thickness, overlap=False)
This function creates a boundary mask for the neural LAM model. The boundary mask defines the edges of the model's domain and helps in handling boundary conditions, which are essential for limited area modeling. It allows for overlap (Flatbread) or non-overlapping boundaries (Donut). The borders of the interior state domain are identified using
anemoi-dataset
functionalitites for lat-lon matching of boundary and state. The thickness of the boundary is applied using binary_dilution, which is a terrible approach -> should be calculated in meters in projection of state.generate_toa_radiation_forcing(ds, lonlat)
This function pre-computes all static features related to the grid nodes for top-of-atmosphere (TOA) radiation. This functionality is provided by the graphcast repository. It is however quite slow for large domains and again depends on lat lon. Maybe there are better options.
Usage
The example script is based on MEPS (since we all have that data) and ERA5, directly from the google weatherbench cloud storage. To run it, store the script on the first level of the NeuralLAM repo, with
meps_example
in the data-folder:The text was updated successfully, but these errors were encountered: