-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPF associated reader not able to parse datetimes beyond the year 2261 #1163
Comments
I've done some simple profiling in IPython, this might work with twice the speed. In [1]: %timeit pd.to_datetime(np.datetime64(imod.util.to_datetime("29991231"), "s"))
20.9 μs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [2]: %timeit pd.to_datetime("20021231", format="%Y%m%d")
40.1 μs ± 445 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) |
On second note: Forget my previous comment, it is faster for individual values, but for arrays it is 10 times slower, as imod.util.to_datetime has to be called for each individual element in a list comprehension, whereas pd.to_datetime is vectorized. Furthermore, the pandas DateTimeIndex used in dataframes and to do resampling, only supports using nanoseconds as unit I read in the documentation. I've tried quite some things to see if I could generate one in some way, but couldn't figure out an easy way to do this. Alternatively, xarray has a CFTimeIndex, but this doesn't work with resampling, as xarray uses pandas for that, so that would require implementing some custom logic to do that for us. |
Also interesting to note: we've never had a complaint about this with regards to IPFs. So apparently such dates do not occur (much) in use. In this case, you might be able to replace a data like 29991231 with 22611231 instead or something (and log/warn). In general, this feels like a very serious issue. But the only workaround that I see is using xarray Datasets instead of pandas DataFrames. That's rather a lot of work. In the meantime, I don't think it's really crippling. If you want to setup a paleo-study with wells, you can do so by using xarray for the time management. Other thoughts: use numpy datetimes instead when going from project file (add a bool argument in the ipf reading)? Maybe convert temporarily to xarray to resample. |
Fixes #1166 and somewhat fixes #1163 # Description - Moves ``pd.to_datetime`` logic to separate place, use ``errors="coerce"`` setting, which turns datetimes beyond the year 2261 to ``pd.NaT``. These seem to be consequently ignored in the ``resample_timeseries``. - Truncate well names to max 16 characters - Support wells defined in "steady-state" entry in the projectfile. - Add test cases for this. # Checklist - [x] Links to correct issue - [ ] Update changelog, if changes affect users - [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737`` - [x] Unit tests were added - [ ] **If feature added**: Added/extended example
I have some fixes which make it work for the LHM (one final value in the timeseries beyond the year 2261) in #1169. If this issue pops again, feel free to re-open |
Wells with associated timeseries in iMOD5 regional models incidentally use timestamps like 29991231 to force wells to continue extracting up until the end of the simulation period (assuming not more than ~1000 years are simulated). However, we parse these datetimes with:
When given a format to
pd.to_datetime
as argument, it always uses nanoseconds as units, and thus it doesn't support datetimes beyond 2261. There is aunit
argument, but this doesn't work together with theformat
argument.I think it is better here to use the
imod.util.to_datetime
function here, that also checks for len_date, so we can remove the IF statement inread_associated
. This returns adatetime.datetime
object, which we can hopefully convert with pandas to a pd.datetime object with a larger unit.Somewhat related to:
#41
The text was updated successfully, but these errors were encountered: