-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Nulls in Timeseries Generator #8925
Conversation
python/cudf/cudf/datasets.py
Outdated
end="2000-01-31", | ||
freq="1s", | ||
dtypes=None, | ||
nulls_frequency=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default be 0 or should we always be generating null data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, to maintain the promise of non-breaking
it should default to 0.
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #8925 +/- ##
================================================
- Coverage 10.67% 10.58% -0.10%
================================================
Files 110 116 +6
Lines 18271 19059 +788
================================================
+ Hits 1951 2017 +66
- Misses 16320 17042 +722
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @isVoid -- this is great!
@gpucibot merge |
This PR adds
nulls_frequency
argument to timeseries generator. It is worth nothing that the random mask generator under the hood is also tied toseed
parameter, in that if two generation process uses the same seed, they will have the same mask distribution.Verifying nulls distribution: