Consider 12-hour offset for CMEMS data #878

veenstrajelmer · 2024-07-04T07:36:30Z

The functions copernicusmarine.subset() and copernicusmarine.open_dataset() always return start-of-interval time samples (e.g. start of hour, day, month, year) because of the underlying ARCO format. Native datasets (retrieved with copernicusmarine.get() or with opendap before December 2023) use a mix of start-of-interval and center-of-interval. We had mid-day timestamps when using opendap, now we have midnight values, but the actual data is the same. This is documented in https://help.marine.copernicus.eu/en/articles/8656000-differences-between-netcdf-and-arco-formats. In dfm_tools the copernicus opendap server was used to retrieve data until approximately December 2023 (so noon timestamps, center-of-interval). After that (v0.18.0 onwards), the copernicusmarine toolbox was used (so midnight timestamps, start-of interval).

Most of the datasets we use are daily means, so consider to correct for this by adding an offset of 12 hours. Frequently used datasets are documented in dfmt.copernicusmarine_get_dataset_id().

dfm_tools/dfm_tools/download.py

Lines 252 to 287 in 03cabed

    
           def copernicusmarine_get_dataset_id(varkey, date_min, date_max): 
        
               #TODO: maybe get dataset_id from 'copernicusmarine describe --include-datasets --contains <search_token>' 
        
               product = copernicusmarine_get_product(date_min, date_max) 
        
               if varkey in ['bottomT','tob','mlotst','siconc','sithick','so','thetao','uo','vo','usi','vsi','zos']: #for physchem 
        
                   # resolution is 1/12 degrees in lat/lon dimension, but a bit more/less in alternating cells 
        
                   if product == 'analysisforecast': #forecast: https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/description 
        
                       if varkey in ['uo','vo']: #anfc datset is splitted over multiple urls 
        
                           dataset_id = 'cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m' 
        
                       elif varkey in ['so']: 
        
                           dataset_id = 'cmems_mod_glo_phy-so_anfc_0.083deg_P1D-m' 
        
                       elif varkey in ['thetao']: 
        
                           dataset_id = 'cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m' 
        
                       else: 
        
                           dataset_id = 'cmems_mod_glo_phy_anfc_0.083deg_P1D-m' 
        
                   else: #reanalysis: https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description 
        
                       dataset_id = 'cmems_mod_glo_phy_my_0.083deg_P1D-m' 
        
               elif varkey in ['nppv','o2','talk','dissic','ph','spco2','no3','po4','si','fe','chl','phyc']: # for bio 
        
                   # resolution is 1/4 degrees 
        
                   if product == 'analysisforecast': #forecast: https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_BGC_001_028/description 
        
                       if varkey in ['nppv','o2']: 
        
                           dataset_id = 'cmems_mod_glo_bgc-bio_anfc_0.25deg_P1D-m' 
        
                       elif varkey in ['talk','dissic','ph']: 
        
                           dataset_id = 'cmems_mod_glo_bgc-car_anfc_0.25deg_P1D-m' 
        
                       elif varkey in ['spco2']: 
        
                           dataset_id = 'cmems_mod_glo_bgc-co2_anfc_0.25deg_P1D-m' 
        
                       elif varkey in ['no3','po4','si','fe']: 
        
                           dataset_id = 'cmems_mod_glo_bgc-nut_anfc_0.25deg_P1D-m' 
        
                       elif varkey in ['chl','phyc']: 
        
                           dataset_id = 'cmems_mod_glo_bgc-pft_anfc_0.25deg_P1D-m' 
        
                   else: #https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_BGC_001_029/description 
        
                       dataset_id = 'cmems_mod_glo_bgc_my_0.25_P1D-m' 
        
               else: 
        
                   raise KeyError(f"unknown varkey for cmems: {varkey}") 
        
               return dataset_id

The PUM states the daily averaged products are centered at noon, not at midnight, this issue restores this behaviour:

Some usecases:

downloading data and interpolate to boundaries to serve as boundary conditions for models, in this case it could make sense to move the daily average to noon, since this is then representative for the entire day and in the middle.
when using the data as validation data for a model, it would be best to compare to daily averages of the model also. With xarray this would most probably end up at midnight also, so no timeshift is desired. When comparing to instantaneous model values, it is slightly more convenient to have the cmems data on midday, but it does not matter much and comparing a daily mean to an instantaneous value on midnight or noon is not accurate anyway.

Check performance and behaviour (like file names and extents) with:

import dfm_tools as dfmt

# spatial extents
lon_min, lon_max, lat_min, lat_max = 12.5, 16.5, 34.5, 37

# time extents
date_min = '2015-11-01'
# date_max = '2020-07-31'
date_max = '2015-11-02'

dataset_id = 'cmems_mod_glo_phy_my_0.083deg_P1D-m' # daily means, corrected
dataset_id = 'cmems_mod_glo_phy_my_0.083deg_P1M-m' # monthly means, not corrected, intermediate days are also downloaded as empty files in case of freq="D"
# dataset_id = 'med-cmcc-cur-rean-d' # daily means, corrected
freq = "Y"
# freq = "M"
# freq = "D"

varkey_dict = {'cmems_mod_glo_phy_my_0.083deg_P1D-m':'uo',
               'cmems_mod_glo_phy_my_0.083deg_P1M-m':'so',
               'med-cmcc-cur-rean-d':'vo'}

dfmt.download_CMEMS(varkey=varkey_dict[dataset_id],
                    longitude_min=lon_min, longitude_max=lon_max, latitude_min=lat_min, latitude_max=lat_max,
                    date_min=date_min, date_max=date_max, freq=freq,
                    dir_output=".", overwrite=True, dataset_id=dataset_id)


# import xarray as xr; ds = xr.open_dataset(r"c:\DATA\checkouts\dfm_tools\tests\uo_2015.nc"); print(ds.time)

Todo:

The new implementation:

downloads required timerange (outside buffered) in files per day/month/year
all files are called to the period, so "2020" for "Y", "2020-11" for "M" and "2020-11-06" for "D". The 12 hour timestamp is not visible in the filenames.
so monthly files of 1nov to 2nov 2020 for daily means at noon (corrected) would result in one "2020-10" file with 1 timestep (31oct 12:00) and one "2020-11" file with 2 timesteps (1nov 12:00 and 2 nov 12:00)
when downloading monthly means with daily freq (or yearly with monthly/daily freq), empty files are created, this was also the case before. This can be avoided by downloading monthly means with monthly or yearly freq, easy as that.
Currently, only daily means are corrected with an offset. The yearly/monthly/hourly/3hourly/6hourly averaged datasets are not corrected.
some products have datasets that have names not according to the convention. A daily mean dataset called *rean-d is also corrected with 12 hours. This is temporary hardcoding that will be removed in copernicusmarine remove hardcoded offset for "rean-d" datasets #1090.

Alternative approach
Alternatively, request a argument for copernicusmarine.open_dataset() to get averaged values in mid-time or start-time of the average. That would completely solve all complexity around this issue. Also request attributes, at the moment it is not clear in the dataset that the time is not instantaneous but averaged. Check if insitu timeseries are instantaneous and not averaged. Requested new argument and/or metadata via [email protected] on 10-7-2024, the request is registered under ticket [MDSOP-179] and mercator-ocean/copernicus-marine-toolbox#271.

Potential projects: BES>>Malta, EDITO

The text was updated successfully, but these errors were encountered:

veenstrajelmer mentioned this issue Jul 4, 2024

Prepare 0.24.0 release #847

Closed

30 tasks

veenstrajelmer mentioned this issue Jul 12, 2024

Prepare 0.25.0 release #907

Closed

27 tasks

veenstrajelmer mentioned this issue Aug 16, 2024

Prepare 0.26.0 release #962

Closed

16 tasks

This was referenced Sep 3, 2024

Prepare 0.27.0 release #985

Closed

Prepare 0.28.0 release #993

Closed

This was referenced Sep 26, 2024

Prepare 0.29.0 release #1010

Closed

Prepare 0.30.0 release #1016

Closed

veenstrajelmer mentioned this issue Oct 20, 2024

Prepare 0.31.0 release #1031

Closed

2 tasks

veenstrajelmer mentioned this issue Oct 28, 2024

Prepare 0.32.0 release #1036

Closed

6 tasks

This was referenced Jan 14, 2025

Prepare 0.33.0 release #1056

Closed

Feature request: add support for noon-centered times to copernicusmarine.open_dataset() mercator-ocean/copernicus-marine-toolbox#271

Open

Prepare 0.34.0 release #1076

Open

This was referenced Jan 23, 2025

Improve modelbuilder notebook #515

Open

Support for noon and monthly averaged data in dfmt.cmems_nc_to_ini() #1086

Closed

veenstrajelmer linked a pull request Jan 28, 2025 that will close this issue

daily means midnight to noon in download_CMEMS #1088

Merged

veenstrajelmer mentioned this issue Jan 30, 2025

copernicusmarine remove hardcoded offset for "rean-d" datasets #1090

Open

veenstrajelmer closed this as completed in #1088 Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider 12-hour offset for CMEMS data #878

Consider 12-hour offset for CMEMS data #878

veenstrajelmer commented Jul 4, 2024 •

edited

Loading

Consider 12-hour offset for CMEMS data #878

Consider 12-hour offset for CMEMS data #878

Comments

veenstrajelmer commented Jul 4, 2024 • edited Loading

veenstrajelmer commented Jul 4, 2024 •

edited

Loading