Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset/retrieve observations: add GTSMip C3S locations and data #1121

Open
8 of 12 tasks
veenstrajelmer opened this issue Feb 6, 2025 · 1 comment · May be fixed by #1153
Open
8 of 12 tasks

subset/retrieve observations: add GTSMip C3S locations and data #1121

veenstrajelmer opened this issue Feb 6, 2025 · 1 comment · May be fixed by #1153
Assignees

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Feb 6, 2025

Adding stations/obspoints from the GTSMip reanalysis dataset on CDS (C3S project) can prove useful for subset/retrieve, and also very much for the modelbuilder. Some code to get them by downloading a small dataset from CDS:

import cdsapi
import zipfile
import pandas as pd
import xarray as xr
import dfm_tools as dfmt
import matplotlib.pyplot as plt
plt.close("all")

dataset = "sis-water-level-change-timeseries-cmip6"
request = {
    "variable": ["total_water_level"],
    "experiment": "reanalysis",
    "temporal_aggregation": ["daily_maximum"],
    "year": ["1983"],
    "month": ["01"],
}

fname = "temp.zip"
filepath_zip = fname
client = cdsapi.Client()
client.retrieve(dataset, request).download(filepath_zip)

print(f'unzipping "{fname}"')
with zipfile.ZipFile(filepath_zip, 'r') as zip_ref:
    zip_ref.extractall(".")

ds = xr.open_dataset(r"c:\DATA\checkouts\dfm_tools\tests\reanalysis_waterlevel_dailymax_1983_01_v2.nc")
stat_data = {"x":ds.station_x_coordinate,
             "y":ds.station_y_coordinate,
             }
df = pd.DataFrame(data=stat_data)
ds.close()

def plot_stations(res="h"):
    fig,ax = plt.subplots()
    df.plot.scatter(x="x", y="y", marker="x", ax=ax)
    ax.set_xlim(lon_min, lon_max)
    ax.set_ylim(lat_min, lat_max)
    dfmt.plot_coastlines(res=res)

# Global
lon_min, lon_max, lat_min, lat_max = -180, 180, -90, 90
plot_stations(res="l")
# Bonaire
lon_min, lon_max, lat_min, lat_max = -68.55, -67.9, 11.8, 12.6
plot_stations(res="h")
# Vietnam
lon_min, lon_max, lat_min, lat_max = 105.8, 106.85, 17.75, 18.5
plot_stations(res="h")

For global:
Image
For Bonaire:
Image
For Vietnam:
Image

So this is useful, but not efficient yet and it excludes the actual downloading of the data.

Additional info:

  • this new source can be used in the subset/retrieve notebook
  • xyn file derived in ECMWF notebook
  • the CDS netcdf files are available here also (including station names): p:\11210221-gtsm-reanalysis\GTSM-ERA5-E_dataset\waterlevel\
  • coords in xyn and output files snapped to cell centers or raw?

Todo:

  • Add list of stations to repos VU-IVM/gtsm3-era5-nrt#23
  • check if coords in xyn are equal to the ones in the CDS dataset
  • decide on new source name, for instance gtsmip-cds. Keep option open to add new version of the cds dataset, maybe link to gtsm-era5 repos name, or gtsm version (gtsm3ip?)
  • add new source in dfmt.ssh_catalog_subset() and dfmt.ssh_retrieve_data() in observations.py (download per month/year based on a user defined time-frame)
  • add new source to tests in test_observations.py
  • add this new source to the subset/retrieve notebook
  • consider subsetting also via cds (optionally adding names via xyn)
  • code width and formatting according to black (without actually calling it)
  • make it possible to retrieve tide/waterlevel/surge via kwargs, also different freqs or just 10-minute?
  • JV?: update modelbuilder notebook
  • update branch with commits from main before updating whats-new.md to avoid conflicts
  • update docs/whats-new.md (this is a new feature)

Contributing guidelines:

  • general contributing information available here (too much info, so cherry pick): https://deltares.github.io/dfm_tools/CONTRIBUTING.html
  • workflow: create a branch from this issue, commit things, open a PR from that branch, request review when first iteration is done
  • keep track of code quality (analysed on each push to a PR by sonarcloud)
  • make sure to cover new code with tests (analysed by codecov after every successful github pytest workflow run)
  • after a PR is created, each push to the branch will trigger the pytest github workflows, these can be tracked here
@Deltares Deltares deleted a comment from n-aleksandrova Feb 19, 2025
@veenstrajelmer veenstrajelmer changed the title add GTSMip C3S obspoint locations subset/retrieve observations: add GTSMip C3S locations and data Feb 19, 2025
@n-aleksandrova
Copy link
Collaborator

The gtsm-era5-cds source was added to observations.py, as well as the retrieve/subset notebook and test_observations.py. The subsetting of locations from the csv file with observation points is implemented. The retrieval of data is almost fully implemented - the data is downloaded, but as one file that includes all subsetted stations. Exceptions were implemented in the function for data retrieval, because point-based retrieval is not possible for the GTSM data from CDS.

To be discussed: should we follow the same method as for other sources and save data in a file per station? This results in a very large number of files compared to other sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants