Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading, defining and outputting CF-compliant projection info #38

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1683d16
add DANRA projection information
observingClouds Nov 11, 2024
aa87576
add projection handlers
observingClouds Nov 13, 2024
4826e20
add doctest CI
observingClouds Nov 13, 2024
b67998d
add pyproj dependency
observingClouds Nov 13, 2024
ba4caa7
save proj source instead of object
observingClouds Nov 13, 2024
9244fdf
add projection info to output
observingClouds Nov 13, 2024
8dd485a
ignore coordinates for projection evaluation
observingClouds Nov 13, 2024
a8eacc4
fix doctest
observingClouds Nov 13, 2024
db737ab
allow slight rounding deviations in doctest
observingClouds Nov 14, 2024
44dff49
make projections optional
observingClouds Nov 14, 2024
fdb984b
fix VALID_EXAMPLE_CONFIG_YAML to work with current codebase
observingClouds Nov 14, 2024
91ddfc2
add optional projection key to config schema
observingClouds Nov 14, 2024
ac0792c
first working implementation
observingClouds Nov 14, 2024
d895094
improve type annotation and docu
observingClouds Nov 14, 2024
ae35d39
fix example for lat,lon retrieval
observingClouds Nov 14, 2024
05c3921
fix doctest
observingClouds Nov 14, 2024
350cf92
document changes
observingClouds Nov 14, 2024
18980b3
fix docstring
observingClouds Nov 14, 2024
5da27df
change standard CRS variable name
observingClouds Nov 18, 2024
0677415
use cartopy plotting compatible WKT
observingClouds Nov 26, 2024
967e9bd
fix typo
observingClouds Nov 26, 2024
b830f95
add details to changes
observingClouds Nov 26, 2024
a584f91
Merge branch 'main' into projection_support_stage1
observingClouds Nov 26, 2024
ed7250a
add warning for overwriting of dataset projection info
observingClouds Nov 26, 2024
a6005f6
use latest schema version for VALID_EXAMPLE_CONFIG_YAML
observingClouds Nov 26, 2024
a63502d
fix linting
observingClouds Nov 26, 2024
d4480c0
fix early evaluation of var_name, altitude
observingClouds Nov 26, 2024
7e5c79c
fix linting
observingClouds Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/python-package-pip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ jobs:
python -m pip install .
python -m pip install pytest

- name: Run doctests
run: |
python -m pytest --doctest-modules mllam_data_prep/

- name: Run tests
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ interface and addresses bugs around optional dependencies for

### Added

- add support for reading, setting (via `projections["__common__"]` section in config) and outputting CF-Conventions complient projection information. ![\#38](https://github.com/mllam/mllam-data-prep/pull/38)
- add `doctest-modules` to CI. ![\#38](https://github.com/mllam/mllam-data-prep/pull/38)
- add optional output path argument to parser. ![\#26](https://github.com/mllam/mllam-data-prep/pull/26)

### Changed
Expand Down
20 changes: 20 additions & 0 deletions example.danra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,26 @@ inputs:
altitude:
values: [100, ]
units: m
projections:
__common__:
observingClouds marked this conversation as resolved.
Show resolved Hide resolved
dims: [x, y]
attributes:
crs_wkt: 'PROJCRS["DMI HARMONIE DANRA lambert projection", BASEGEOGCRS["DMI HARMONIE DANRA lambert CRS", DATUM["DMI HARMONIE DANRA lambert datum", ELLIPSOID["Sphere", 6367470, 0, LENGTHUNIT["metre", 1, ID["EPSG", 9001]]]], PRIMEM["Greenwich", 0, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8901]]], ID["EPSG",4035]], CONVERSION["Lambert Conic Conformal (2SP)", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG", 9802]], PARAMETER["Latitude of false origin", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8821]]], PARAMETER["Longitude of false origin", 25, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8822]]], PARAMETER["Latitude of 1st standard parallel", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8823]]], PARAMETER["Latitude of 2nd standard parallel", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8824]]], PARAMETER["Easting at false origin", 0, LENGTHUNIT["metre", 1, ID["EPSG", 8826]]], PARAMETER["Northing at false origin", 0, LENGTHUNIT["metre", 1, ID["EPSG", 8827]]]], CS[Cartesian, 2], AXIS["(E)", east, ORDER[1], LENGTHUNIT["metre", 1, ID["EPSG", 9001]]], AXIS["(N)", north, ORDER[2], LENGTHUNIT["metre", 1, ID["EPSG", 9001]]], USAGE[AREA["Denmark and surrounding regions"], BBOX[47, -3, 65, 25], SCOPE["Danra reanalysis projection"]]]'
semi_major_axis: 6367470.0
semi_minor_axis: 6367470.0
inverse_flattening: 0.0
reference_ellipsoid_name: 'Sphere'
longitude_of_prime_meridian: 0.0
prime_meridian_name: 'Greenwich'
geographic_crs_name: 'DMI HARMONIE DANRA lambert CRS'
horizontal_datum_name: 'DMI HARMONIE DANRA lambert datum'
projected_crs_name: 'DMI HARMONIE DANRA lambert projection'
grid_mapping_name: 'lambert_conformal_conic'
standard_parallel: [0.0567, 0.0567]
latitude_of_projection_origin: 0.0567
longitude_of_central_meridian: 0.025
false_easting: 0.0
false_northing: 0.0
dim_mapping:
time:
method: rename
Expand Down
27 changes: 26 additions & 1 deletion mllam_data_prep/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,26 @@ class DimMapping:
name_format: str = field(default=None)


@dataclass
class Projection:
"""
Definition of a single coordinate reference system (CRS) projection that
variables in the dataset refer to.

Attributes
----------
dims : List[str]
The dimensions of the projection, usually ["x", "y"].
attributes : Dict[str, Any]
The attributes that define the projection. The attributes need to adhere to the CF-conventions
and contain either a `crs_wkt` attribute or individual attributes that are specific to the used projection.
If both are given, then the `crs_wkt` attribute will be used to create the projection object.
"""

dims: List[str]
attributes: Dict[str, Any]


@dataclass
class InputDataset:
"""
Expand Down Expand Up @@ -155,10 +175,15 @@ class InputDataset:

path: str
dims: List[str]
variables: Union[List[str], Dict[str, Dict[str, ValueSelection]]]
variables: Union[
List[str],
Dict[str, Dict[str, ValueSelection]],
Dict[str, Dict[str, Dict[str, str]]],
]
dim_mapping: Dict[str, DimMapping]
target_output_variable: str
attributes: Dict[str, Any] = None
projections: Optional[Dict[str, Projection]] = None


@dataclass
Expand Down
52 changes: 52 additions & 0 deletions mllam_data_prep/create_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from pathlib import Path

import numpy as np
import pyproj
import xarray as xr
from loguru import logger
from numcodecs import Blosc
Expand All @@ -12,6 +13,11 @@
from .config import Config, InvalidConfigException
from .ops.loading import load_and_subset_dataset
from .ops.mapping import map_dims_and_variables
from .ops.projection import (
ProjectionInconsistencyWarning,
get_projection_crs,
validate_projection_consistency,
)
from .ops.selection import select_by_kwargs
from .ops.statistics import calc_stats

Expand Down Expand Up @@ -122,13 +128,15 @@ def create_dataset(config: Config):
output_coord_ranges = output_config.coord_ranges

dataarrays_by_target = defaultdict(list)
projections = []

for dataset_name, input_config in config.inputs.items():
path = input_config.path
variables = input_config.variables
target_output_var = input_config.target_output_variable
expected_input_attributes = input_config.attributes or {}
expected_input_var_dims = input_config.dims
dataset_projections = input_config.projections or {}

output_dims = output_config.variables[target_output_var]

Expand All @@ -143,6 +151,10 @@ def create_dataset(config: Config):
dataset_name=dataset_name,
)

# TODO: remove this once grid_mapping is set in config file
for var in ds.data_vars:
ds[var].attrs["grid_mapping"] = "__common__"

leifdenby marked this conversation as resolved.
Show resolved Hide resolved
dim_mapping = input_config.dim_mapping

# check that there is an entry for each arch dimension
Expand Down Expand Up @@ -174,6 +186,28 @@ def create_dataset(config: Config):

da_target.attrs["source_dataset"] = dataset_name

# get the projection information from the dataset and update it with the projection
# information given in the input config
# TODO: read the projection information from the config
projection_crs = get_projection_crs(ds)
if projection_crs is None and dataset_projections:
logger.warning(
f"Projection information not found in dataset {dataset_name}, using projection information from config."
)
projection_crs = {
crs_var: p.attributes for crs_var, p in dataset_projections.items()
}
elif projection_crs is None and not dataset_projections:
logger.warning(
f"No projection information found neither in dataset {dataset_name}, nor in config."
)
observingClouds marked this conversation as resolved.
Show resolved Hide resolved
elif projection_crs is not None and dataset_projections:
logger.warning(
f"Projection information given in {dataset_name} is overwritten by that given in config."
)
if projection_crs is not None:
projections.extend(projection_crs.values())

# only need to do selection for the coordinates that the input dataset actually has
if output_coord_ranges is not None:
selection_kwargs = {}
Expand All @@ -184,6 +218,18 @@ def create_dataset(config: Config):

dataarrays_by_target[target_output_var].append(da_target)

# validate projections across input datasets
if projections:
try:
validate_projection_consistency(projections)
except ProjectionInconsistencyWarning as e:
logger.warning(f"Projection information might be ambiguous: {e}")
projection = pyproj.CRS.from_cf(projections[0])

# TODO: generalize the retrieval of x and y coords
# coords = (dataarrays_by_target[target_output_var][0]['x'], dataarrays_by_target[target_output_var][0]['y'])
# lon, lat = get_latitude_longitude_from_projection(projection, coords)
Comment on lines +229 to +231
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ealerskans this would be my current proposal for you on where and how you could generate the lats and lons from x and y.

This is obviously per projection in each dataset and because we only accept one projection for now, you might just skip this step for all following datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function is available from from .ops.projection

leifdenby marked this conversation as resolved.
Show resolved Hide resolved

ds = _merge_dataarrays_by_target(dataarrays_by_target=dataarrays_by_target)

# need to drop the encoding so that we can write to zarr with new chunksizes
Expand Down Expand Up @@ -230,6 +276,12 @@ def create_dataset(config: Config):
)
ds["splits"] = da_splits

# add the projection information to the dataset
if projections:
ds["__common__"] = xr.DataArray(0, attrs=projection.to_cf()).astype("int16")
# for var in ["VARIABLES_WITH_PROKJECTION"]:
# ds[var].attrs["grid_mapping"] = "__common__"

ds.attrs = {}
ds.attrs["schema_version"] = config.schema_version
ds.attrs["dataset_version"] = config.dataset_version
Expand Down
Loading