Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Nulls in Timeseries Generator #8925

Merged
merged 2 commits into from
Aug 11, 2021

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Aug 2, 2021

This PR adds nulls_frequency argument to timeseries generator. It is worth nothing that the random mask generator under the hood is also tied to seed parameter, in that if two generation process uses the same seed, they will have the same mask distribution.

Verifying nulls distribution:

>>> import cudf
>>> gdf = cudf.datasets.timeseries(start='2020-02-01', end='2021-02-01', freq='1D', dtypes={"name": "category", "id": int, "x": float, "y": float}, nulls_frequency=0.7, seed=1)
>>> for col in gdf:
...    print(gdf[col].isna().sum() / len(gdf[col]))
0.6920980926430518
0.7111716621253406
0.7138964577656676
0.7029972752043597

@isVoid isVoid requested a review from a team as a code owner August 2, 2021 18:40
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 2, 2021
@isVoid isVoid changed the title Support nulls in timeseries generator Support Nulls in Timeseries Generator Aug 2, 2021
@isVoid isVoid added 3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change labels Aug 2, 2021
end="2000-01-31",
freq="1s",
dtypes=None,
nulls_frequency=0.1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be 0 or should we always be generating null data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, to maintain the promise of non-breaking it should default to 0.

@codecov
Copy link

codecov bot commented Aug 3, 2021

Codecov Report

Merging #8925 (698f0a8) into branch-21.10 (18f7c01) will decrease coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.10    #8925      +/-   ##
================================================
- Coverage         10.67%   10.58%   -0.10%     
================================================
  Files               110      116       +6     
  Lines             18271    19059     +788     
================================================
+ Hits               1951     2017      +66     
- Misses            16320    17042     +722     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/column.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/lists.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/struct.py 0.00% <ø> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <ø> (ø)
python/cudf/cudf/core/frame.py 0.00% <ø> (ø)
... and 77 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ba3ed5...698f0a8. Read the comment docs.

Copy link
Member

@quasiben quasiben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @isVoid -- this is great!

@quasiben
Copy link
Member

quasiben commented Aug 3, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e3f35af into rapidsai:branch-21.10 Aug 11, 2021
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants