You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Downloading long timeseries for CMEMS is slow with dfm_tools, even though the actual download happens with a daily frequency. This is probably since per default the entire requested dataset is opened, from which then daily subsets are retrieved:
This example shows that when cutting it up in monthly chunks, the download is way faster compared to retrieving at once:
dfm_tools-dependent example
importdfm_toolsasdfmtimportpandasaspd# spatial extentslon_min, lon_max, lat_min, lat_max=12.5, 16.5, 34.5, 37# time extentsdate_min='2017-12-01'date_max='2022-07-31'# make list of start/stop times (tuples) with monthly frequency# TODO: this approach improves performance significantlydate_range_start=pd.date_range(start=date_min, end=date_max, freq='MS')
date_range_end=pd.date_range(start=date_min, end=date_max, freq='ME')
monthly_periods= [(start, end) forstart, endinzip(date_range_start, date_range_end)]
# make list of start/stop times (tuples) to download all at once (but still per day)# TODO: this is the default behaviour and is slowmonthly_periods= [(date_min, date_max)]
forperiodinmonthly_periods:
dfmt.download_CMEMS(varkey='uo',
longitude_min=lon_min, longitude_max=lon_max, latitude_min=lat_min, latitude_max=lat_max,
date_min=period[0], date_max=period[1],
dir_output=".", overwrite=True, dataset_id='med-cmcc-cur-rean-d')
Example without dfm_tools dependency:
importcopernicusmarineimportpandasaspd# spatial extentslongitude_min, longitude_max, latitude_min, latitude_max=12.5, 16.5, 34.5, 37# time extents # be sure to start with 1st of month and end with last of month# since monthly_periods generator is too simple for other datesdate_min='2017-12-01'date_max='2022-07-31'# make list of start/stop times (tuples) with monthly frequency# TODO: this approach improves performance significantlydate_range_start=pd.date_range(start=date_min, end=date_max, freq='MS')
date_range_end=pd.date_range(start=date_min, end=date_max, freq='ME')
monthly_periods= [(start, end) forstart, endinzip(date_range_start, date_range_end)]
# make list of start/stop times (tuples) to download all at once (but still per day)# TODO: this is the default behaviour of dfm_tools and it is slowmonthly_periods= [(pd.Timestamp(date_min), pd.Timestamp(date_max))]
forperiodinmonthly_periods:
varkey='uo'dataset=copernicusmarine.open_dataset(
dataset_id='med-cmcc-cur-rean-d',
variables= [varkey],
minimum_longitude=longitude_min,
maximum_longitude=longitude_max,
minimum_latitude=latitude_min,
maximum_latitude=latitude_max,
# temporarily convert back to strings because of https://github.com/mercator-ocean/copernicus-marine-toolbox/issues/261# TODO: revert, see https://github.com/Deltares/dfm_tools/issues/1047start_datetime=period[0].isoformat(),
end_datetime=period[1].isoformat(),
)
freq="D"# 1 netcdf file per dayperiod_range=pd.period_range(date_min,date_max,freq=freq)
fordateinperiod_range:
date_str=str(date)
name_output=f'cmems_{varkey}_{date_str}.nc'dataset_perperiod=dataset.sel(time=slice(date_str, date_str))
print(f'xarray writing netcdf file: {name_output}')
dataset_perperiod.to_netcdf(name_output)
add None to accepted freqs for download_CMEMS() >> dropped support for None instead
files from 2018 onwards are significantly smaller, catch this? >> were empty files. This is not an issue in dfm_tools since it is catched by copernicusmarine_get_dataset_id(), an error is raised if the user requests a period outside of the available time span.
The text was updated successfully, but these errors were encountered:
Downloading long timeseries for CMEMS is slow with dfm_tools, even though the actual download happens with a daily frequency. This is probably since per default the entire requested dataset is opened, from which then daily subsets are retrieved:
dfm_tools/dfm_tools/download.py
Lines 216 to 249 in f7e5234
This example shows that when cutting it up in monthly chunks, the download is way faster compared to retrieving at once:
dfm_tools-dependent example
Example without
dfm_tools
dependency:Todo:
service="arco-geo-series", chunk_size_limit=None
None
to accepted freqs fordownload_CMEMS()
>> dropped support forNone
insteadcopernicusmarine_get_dataset_id()
, an error is raised if the user requests a period outside of the available time span.The text was updated successfully, but these errors were encountered: