-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/ Allow empty TimeSeries #1359
Conversation
…ciated test to catch the ValueError about the inconrrect dimension of the input DataArray
Codecov ReportBase: 93.88% // Head: 93.84% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1359 +/- ##
==========================================
- Coverage 93.88% 93.84% -0.05%
==========================================
Files 78 78
Lines 8523 8513 -10
==========================================
- Hits 8002 7989 -13
- Misses 521 524 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but:
- I would actually remove the warning (let me know if you think it's a terrible idea)
- We need to slightly adapt the introductory docstring at the top of TimeSeries saying one of the guarantees is that it's non-empty (maybe also in the user guide - to be checked)
- Could you try and see what happens when you print the series, or when you plot it? I expect plot() to crash... we should perhaps support creating an empty figure.
…ed support for plotting empty timeseries, removed the bullet-point about non-emptiness of TimeSeries in the docstring
Done, I was probably being too pessimistic about the users capabilities to debug their code.
I looked at the User Guide, could not find any mention of the non-emptiness.
It was indeed crashing, I added support for plotting empty TimeSeries and made it so that the x-axis is labeled correctly (using the |
For the non-emptiness reference, I was referring to this line. |
Oh I already removed this line based on your previous comment, my comment was just to say that I could not find any reference about the non-emptiness in the user-guide :) |
My bad, I missed it. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Maybe I'm misunderstanding the purpose of this PR, but as far as I can tell this doesn't fix the original issue: import darts
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3, 4, 5]}, index=pd.date_range(start="2022-01-01", end="2022-01-05"))
timeseries = darts.TimeSeries.from_dataframe(df)
print(timestamp[5:6])
...
ValueError: Neither `start` nor `end` can be NaT Full tracebackFile ~/miniconda3/envs/test2/lib/python3.10/site-packages/darts/timeseries.py:4717, in TimeSeries.__getitem__(self, key)
4713 xa_ = self._xa.isel({self._time_dim: key})
4714 _set_freq_in_xa(
4715 xa_
4716 ) # indexing may discard the freq so we restore it...
-> 4717 return self.__class__(xa_)
4718 elif isinstance(key.start, pd.Timestamp) or isinstance(
4719 key.stop, pd.Timestamp
4720 ):
4721 _check_dt()
File ~/miniconda3/envs/test2/lib/python3.10/site-packages/darts/timeseries.py:191, in TimeSeries.__init__(self, xa)
184 self._xa.get_index(self._time_dim).freq = self._freq
186 # We have to check manually if the index is complete. Another way could be to rely
187 # on `inferred_freq` being present, but this fails for series of length < 3.
189 is_index_complete = (
190 len(
--> 191 pd.date_range(
192 self._time_index.min(), self._time_index.max(), freq=self._freq
193 ).difference(self._time_index)
194 )
195 == 0
196 )
198 raise_if_not(
199 is_index_complete,
200 "Not all timestamps seem to be present in the time index. Does "
(...)
204 logger,
205 )
206 else:
File ~/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py:1125, in date_range(start, end, periods, freq, tz, normalize, name, closed, inclusive, **kwargs)
1122 if freq is None and com.any_none(periods, start, end):
1123 freq = "D"
-> 1125 dtarr = DatetimeArray._generate_range(
1126 start=start,
1127 end=end,
1128 periods=periods,
1129 freq=freq,
1130 tz=tz,
1131 normalize=normalize,
1132 inclusive=inclusive,
1133 **kwargs,
1134 )
1135 return DatetimeIndex._simple_new(dtarr, name=name)
File ~/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py:367, in DatetimeArray._generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, inclusive)
364 end = Timestamp(end)
366 if start is NaT or end is NaT:
--> 367 raise ValueError("Neither `start` nor `end` can be NaT")
369 left_inclusive, right_inclusive = validate_inclusive(inclusive)
370 start, end, _normalized = _maybe_normalize_endpoints(start, end, normalize)
ValueError: Neither `start` nor `end` can be NaT Same thing happens with print(timeseries[pd.Timestamp("2022-01-06"):pd.Timestamp("2022-01-08")])
...
ValueError: Neither `start` nor `end` can be NaT |
Hi! I got a bit too excited and forgot to check if the fix was also working for TimeSeries with DatetimeIndex... Thank you for reporting the bug, I am working on a fix and open a PR by the end of the week. |
Fixes #1279.
Summary
Remove the assert preventing to create an empty
TimeSeries
in the constructor and replaced it by a warning to limit the risk of unexpected behavior for the users. Edited the corresponding test to catch theValueError
about missing dimensions in inputxarray.DataArray
instead of its emptiness.###Other Information
The currents tests pass with these modifications but we might want to add tests or exception in
fit
methods to better handle the empty TimeSeries as they can now occur?