fixing open-ended adjustements error #216

BaptisteVandecrux · 2023-12-18T13:20:50Z

While trying to prescribe an open-ended adjustments, I noticed that it currently causes an error.

When we read the start end end times of adjustments, they receive a time zone info due to their ISO format. The AWS xarray dataset does not have a time zone info (because of an xarray limitation). So the timezone info is removed from the adjustments time bounds (l.183-184).

What was missing is that when start or end date of adjustments are blank (meaning open-start, open-ended bounds), we use a timestamp (then time-zone-naive) from the AWS dataset, and that it then causes an error later on when trying to remove the time-zone info from these same time-zone-naive bounds.

While trying to prescribe an open-ended adjustments, I noticed that it currently causes an error. When we read the start end end times of adjustments, they receive a time zone info due to their ISO format. The AWS xarray dataset does not have a time zone info (because of an [xarray limitation](pydata/xarray#3291)). So the timezone info is removed from the adjustments time bounds (l.183-184). What was missing is that when start or end date of adjustments are blank (meaning open-start, open-ended bounds), we use a timestamp (then time-zone-naive) from the AWS dataset, and that it then causes an error later on when trying to remove the time-zone info from these same time-zone-naive bounds.

switching copies to deep copies

ladsmund

The PR looks like it solves the problem.
Note: there is a bug related to non-UTC adjustment timecodes.

I have sugested a more simple solution to the open ended intervals.

src/pypromice/qc/github_data_issues.py

Co-authored-by: Mads Christian Lund <[email protected]>

ladsmund · 2023-12-20T08:20:58Z

src/pypromice/qc/github_data_issues.py

+        adj_info[['t0','t1']] = adj_info[['t0','t1']].astype(object)
+        adj_info.loc[adj_info.t1.isnull()|(adj_info.t1==''), "t1"] = None      
+        adj_info.loc[adj_info.t0.isnull()|(adj_info.t0==''), "t0"] = None


t0 and t1 are already checked in line 219:222 except for the empty string and np.nan
Consider instead

if isinstance(t0, str) and t0 != '': t0 = pd.to_datetime(t0, utc=True).tz_localize(None) else: t0 = None if isinstance(t1, str) and t1 != '': t1 = pd.to_datetime(t1, utc=True).tz_localize(None) else: t1 = None

Not a big fan of if/else in for loop. I'd rather prepare the adj_info table before looping through it.
Now I see that I have used part of your suggestion l.219-222 so I admit it is currently a bit hybrid.

ladsmund · 2023-12-20T08:25:21Z

src/pypromice/qc/github_data_issues.py

-        adj_info.t1 = pd.to_datetime(adj_info.t1).dt.tz_localize(None)
-
+        # making sure that t0 and t1 columns are object dtype then replaceing nan with None
+        adj_info[['t0','t1']] = adj_info[['t0','t1']].astype(object)


I don't understand why you cast the type to object. I suppose the dynamically inferred types are either

str if all the values in the columns are strings

float if all the values in the columns inferred as nan

object if the values in the columns have different types

I guess that, along with your next comment, it is a matter of personal preference:

I prefer replacing all the missing dates by None in two single-line vectorized call (l.181-182) rather than having ifs in a for loop. But before I do that, I need to make sure that the t0 and t1 columns can accommodate None. If a t0 or t1 column has a float type and I try to replace the values by None pandas actually coerces those None into their float equivalent: np.nan. This causes problem later as slice(np.nan, np.nan) fails. When t0 and t1 are of object type, then replacing some of their values by None will keep the None as NoneType.

BaptisteVandecrux requested a review from ladsmund December 18, 2023 13:21

Update github_data_issues.py

b3baab8

switching copies to deep copies

ladsmund requested changes Dec 19, 2023

View reviewed changes

src/pypromice/qc/github_data_issues.py Outdated Show resolved Hide resolved

src/pypromice/qc/github_data_issues.py Outdated Show resolved Hide resolved

BaptisteVandecrux and others added 2 commits December 19, 2023 08:40

Update src/pypromice/qc/github_data_issues.py

013a351

Co-authored-by: Mads Christian Lund <[email protected]>

Update github_data_issues.py

4b7cc9e

ladsmund previously approved these changes Dec 19, 2023

View reviewed changes

Bug fix for all open-ended adjustments

5f2973b

BaptisteVandecrux dismissed ladsmund’s stale review via 5f2973b December 19, 2023 08:45

BaptisteVandecrux and others added 2 commits December 19, 2023 16:04

fix bug with resample and biweekly_upper_range_filter

c19b7c1

improved adjustment logging

5431d19

ladsmund reviewed Dec 20, 2023

View reviewed changes

BaptisteVandecrux merged commit a7997ef into main Dec 20, 2023
4 checks passed

BaptisteVandecrux deleted the fixing-open-bounds-adjustements branch December 20, 2023 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing open-ended adjustements error #216

fixing open-ended adjustements error #216

BaptisteVandecrux commented Dec 18, 2023

ladsmund left a comment

ladsmund Dec 20, 2023

BaptisteVandecrux Dec 20, 2023

ladsmund Dec 20, 2023

BaptisteVandecrux Dec 20, 2023

fixing open-ended adjustements error #216

fixing open-ended adjustements error #216

Conversation

BaptisteVandecrux commented Dec 18, 2023

ladsmund left a comment

Choose a reason for hiding this comment

ladsmund Dec 20, 2023

Choose a reason for hiding this comment

BaptisteVandecrux Dec 20, 2023

Choose a reason for hiding this comment

ladsmund Dec 20, 2023

Choose a reason for hiding this comment

BaptisteVandecrux Dec 20, 2023

Choose a reason for hiding this comment