You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Retrieving data from ddl with ddlpy is quite slow because of the hardcoded monthly frequency. Each requests takes quite some time, even if there is no data returned, so a yearly frequency would be much more efficient. However, it is not always possible to retrieve an entire year of 10-minute values in case of many duplicated timesteps. The maximum number of returned values by ddl is 157681, this number is sometimes exceeded as is documented in Rijkswaterstaat/wm-ws-dl#39 for a subset of stations. This issue focusses on 10-minute WATHTE data only, but there might also be other timeseries with higher frequencies or more duplicates that also exceed this number. Even if the number is not exceeded, the ddl also sometimes raised timeout errors. It was therefore wisely chosen to set the monthly frequency as the default. However, for water level extremes (four-daily), a yeraly frequency will not cause issues but it will improve the performance significantly since the overhead is reduced with a factor 12. Also for most 10-minute timeseries a yearly frequency is fine, but this would require try-except so should not be the default.
Suggestion
Replace the hardcoded dateutil.rrule.MONTHLY with a function argument that also supports dateutil.rrule.YEARLY and others.
also add option to download entire period at once (freq=None)
Note
This feature should be used with caution, when requesting a too large dataset at once, sometimes the response is empty instead of getting a decent error message back: Rijkswaterstaat/wm-ws-dl#40
The text was updated successfully, but these errors were encountered:
Description
Retrieving data from ddl with ddlpy is quite slow because of the hardcoded monthly frequency. Each requests takes quite some time, even if there is no data returned, so a yearly frequency would be much more efficient. However, it is not always possible to retrieve an entire year of 10-minute values in case of many duplicated timesteps. The maximum number of returned values by ddl is 157681, this number is sometimes exceeded as is documented in Rijkswaterstaat/wm-ws-dl#39 for a subset of stations. This issue focusses on 10-minute WATHTE data only, but there might also be other timeseries with higher frequencies or more duplicates that also exceed this number. Even if the number is not exceeded, the ddl also sometimes raised timeout errors. It was therefore wisely chosen to set the monthly frequency as the default. However, for water level extremes (four-daily), a yeraly frequency will not cause issues but it will improve the performance significantly since the overhead is reduced with a factor 12. Also for most 10-minute timeseries a yearly frequency is fine, but this would require try-except so should not be the default.
Suggestion
dateutil.rrule.MONTHLY
with a function argument that also supportsdateutil.rrule.YEARLY
and others.freq=None
)Note
This feature should be used with caution, when requesting a too large dataset at once, sometimes the response is empty instead of getting a decent error message back: Rijkswaterstaat/wm-ws-dl#40
The text was updated successfully, but these errors were encountered: