Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPF associated reader not able to parse datetimes beyond the year 2261 #1163

Closed
Tracked by #1155
JoerivanEngelen opened this issue Aug 20, 2024 · 4 comments
Closed
Tracked by #1155
Assignees

Comments

@JoerivanEngelen
Copy link
Contributor

JoerivanEngelen commented Aug 20, 2024

Wells with associated timeseries in iMOD5 regional models incidentally use timestamps like 29991231 to force wells to continue extracting up until the end of the simulation period (assuming not more than ~1000 years are simulated). However, we parse these datetimes with:

...
def read_associated(path, kwargs={})
...
        len_date = len(df[time_column].iloc[0])
        if len_date == 14:
            df[time_column] = pd.to_datetime(df[time_column], format="%Y%m%d%H%M%S")
        elif len_date == 8:
            df[time_column] = pd.to_datetime(df[time_column], format="%Y%m%d")
...

When given a format to pd.to_datetime as argument, it always uses nanoseconds as units, and thus it doesn't support datetimes beyond 2261. There is a unit argument, but this doesn't work together with the format argument.

I think it is better here to use the imod.util.to_datetime function here, that also checks for len_date, so we can remove the IF statement in read_associated. This returns a datetime.datetime object, which we can hopefully convert with pandas to a pd.datetime object with a larger unit.

Somewhat related to:
#41

@github-project-automation github-project-automation bot moved this to 📯 New in iMOD Suite Aug 20, 2024
@JoerivanEngelen JoerivanEngelen changed the title IPF reader not able to parse datetimes beyond the year 2261 IPF associated reader not able to parse datetimes beyond the year 2261 Aug 20, 2024
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Aug 20, 2024

I've done some simple profiling in IPython, this might work with twice the speed.

In [1]: %timeit pd.to_datetime(np.datetime64(imod.util.to_datetime("29991231"), "s"))
20.9 μs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [2]: %timeit pd.to_datetime("20021231", format="%Y%m%d")
40.1 μs ± 445 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Aug 21, 2024

On second note:

Forget my previous comment, it is faster for individual values, but for arrays it is 10 times slower, as imod.util.to_datetime has to be called for each individual element in a list comprehension, whereas pd.to_datetime is vectorized.

Furthermore, the pandas DateTimeIndex used in dataframes and to do resampling, only supports using nanoseconds as unit I read in the documentation. I've tried quite some things to see if I could generate one in some way, but couldn't figure out an easy way to do this. Alternatively, xarray has a CFTimeIndex, but this doesn't work with resampling, as xarray uses pandas for that, so that would require implementing some custom logic to do that for us.

@Huite
Copy link
Contributor

Huite commented Aug 21, 2024

Also interesting to note: we've never had a complaint about this with regards to IPFs. So apparently such dates do not occur (much) in use. In this case, you might be able to replace a data like 29991231 with 22611231 instead or something (and log/warn).

In general, this feels like a very serious issue. But the only workaround that I see is using xarray Datasets instead of pandas DataFrames. That's rather a lot of work.

In the meantime, I don't think it's really crippling. If you want to setup a paleo-study with wells, you can do so by using xarray for the time management.
You just can't setup a paleostudy nicely going from the project file...

Other thoughts: use numpy datetimes instead when going from project file (add a bool argument in the ipf reading)? Maybe convert temporarily to xarray to resample.

@JoerivanEngelen JoerivanEngelen self-assigned this Aug 21, 2024
@JoerivanEngelen JoerivanEngelen moved this from 📯 New to 🏗 In Progress in iMOD Suite Aug 21, 2024
@JoerivanEngelen JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Aug 21, 2024
JoerivanEngelen added a commit that referenced this issue Aug 23, 2024
Fixes #1166 and somewhat fixes #1163

# Description
- Moves ``pd.to_datetime`` logic to separate place, use
``errors="coerce"`` setting, which turns datetimes beyond the year 2261
to ``pd.NaT``. These seem to be consequently ignored in the
``resample_timeseries``.
- Truncate well names to max 16 characters
- Support wells defined in "steady-state" entry in the projectfile.
- Add test cases for this.

# Checklist
- [x] Links to correct issue
- [ ] Update changelog, if changes affect users
- [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737``
- [x] Unit tests were added
- [ ] **If feature added**: Added/extended example
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Aug 23, 2024

I have some fixes which make it work for the LHM (one final value in the timeseries beyond the year 2261) in #1169. If this issue pops again, feel free to re-open

@github-project-automation github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants