Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend DateTimeFormatCheck to work for multiseries #4300

Merged
merged 13 commits into from
Sep 12, 2023
Prev Previous commit
Next Next commit
fixed tests and small changes
MichaelFu512 committed Sep 11, 2023
commit 9f266cfb1325cacc3e35d7c069ab39937316dca6
22 changes: 10 additions & 12 deletions evalml/data_checks/datetime_format_data_check.py
Original file line number Diff line number Diff line change
@@ -402,18 +402,17 @@ def validate(self, X, y):
)
return messages

series_datetime = (
[datetime_values] if self.series_id is None else X[self.series_id].unique()
)
series_datetime = [0] if self.series_id is None else X[self.series_id].unique()
for series in series_datetime:
# if multiseries only select the datetimes corresponding to one series
if is_multiseries:
curr_series_df = X[X[self.series_id] == series]
if self.datetime_column != "index":
datetime_values = X[X[self.series_id] == series][
self.datetime_column
].reset_index(drop=True)
datetime_values = curr_series_df[self.datetime_column].reset_index(
drop=True,
)
else:
datetime_values = X[X[self.series_id] == series].index
datetime_values = curr_series_df.index

# Check if the data is monotonically increasing
no_nan_datetime_values = datetime_values.dropna()
@@ -439,11 +438,10 @@ def validate(self, X, y):
)
inferred_freq = ww_payload[0]
debug_object = ww_payload[1]
if inferred_freq is not None:
if is_multiseries:
continue
else:
return messages
if inferred_freq is not None and is_multiseries:
continue
elif inferred_freq is not None:
return messages

# Check for NaN values
if len(debug_object["nan_values"]) > 0:
Loading