Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wetday frequency correction #174

Merged

Conversation

emileten
Copy link
Contributor

@emileten emileten commented Jan 25, 2022

essential changes :

  • in the core function, instead of using a unique replacement value, use an array of (likely) different values, in process='pre'
  • values strictly below 1 are modified, rather than below or equal to 1, in process='pre'.
  • make tests deterministic (with a seed), smaller and more precise.

secondary changes :

  1. in the core function, directly operate on the data array object within the dataset -- which means I have to introduce a var parameter -- default value is 'pr', but in tests we have other variable names. Need that parameter in the service as well.

@emileten emileten requested review from brews and dgergel January 25, 2022 01:59
@emileten
Copy link
Contributor Author

emileten commented Jan 25, 2022

@dgergel I requested your review for the method check.

@brews, in the computational side. I tested this on both Jupyter and Argo, and this ran within 3 mins using around 15GB of memory (that's because of the huge numpy sample array that I have to create, which has the shape of the GCM data). That fits in ~20GB resources for that step. We might need to increase slightly this number so that the ERA-5 data fits in.

But it will fit in the node and run fast.

@emileten emileten self-assigned this Jan 25, 2022
Copy link
Member

@brews brews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up, @emileten.

My one suggestion is to rename this new var argument to variable. This makes it consistent with the other functions and methods that grab a variable name. ...I see "var" and I think "variance" but that's an aside....

@emileten
Copy link
Contributor Author

Thanks for cleaning this up, @emileten.

My one suggestion is to rename this new var argument to variable. This makes it consistent with the other functions and methods that grab a variable name. ...I see "var" and I think "variance" but that's an aside....

Thanks @brews, I changed this !

Copy link
Member

@dgergel dgergel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks @emileten for fixing this bug!

@emileten emileten merged commit a903591 into ClimateImpactLab:main Jan 25, 2022
ds > threshold, np.random.uniform(low=low, high=threshold)
ds[variable] = ds[variable].where(
ds[variable] >= threshold,
np.random.uniform(low=low, high=threshold, size=ds[variable].shape),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is here - changing from np.random.uniform(low, high) to np.random.uniform(low, high, size) changes the returned type from pure python float to ndarray[np.float64]. xarray coerces the former to ds[variable].dtype, but defers to the higher precision data type of the latter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wet-day frequency correction uses only one value for each 'dry' data point
4 participants