Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Update orc reader and writer fuzz tests #7357

Merged
merged 5 commits into from
Feb 11, 2021

Conversation

galipremsagar
Copy link
Contributor

This PR introduces:

  • Fixes to some of the breakages introduced by the latest pyorc in using pyorc.Struct.
  • Adapt to list dtype parameter changes introduced previously.
  • Misc fixes required for proper fuzz test runs.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 9, 2021
@galipremsagar galipremsagar self-assigned this Feb 9, 2021
@galipremsagar galipremsagar requested a review from a team as a code owner February 9, 2021 20:24
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 9, 2021
@galipremsagar galipremsagar requested a review from vuule February 9, 2021 20:24
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but it's kind of hard to understand why some changes were made (without the detailed description).

@@ -67,6 +67,19 @@ def generate_input(self):
dtypes_meta, num_rows, num_cols = _generate_rand_meta(
self, dtypes_list
)
if num_cols == 0:
"""
If a dataframe has no columns, then pyorc writer will throw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered what the desired behavior is here, i.e. whether ORC as a format supports having no columns.
I think we also have some issues writing empty dataframes to ORC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to write an empty struct. With pyorc we can do it this way:

>>> import pyorc
>>> import pandas as pd
>>> output = open("sample.orc", "wb")
>>> writer = pyorc.Writer(output, pyorc.Struct())
>>> writer.close()
>>> pd.read_orc('sample.orc')
Empty DataFrame
Columns: []
Index: []

Looks like I also need to make some code-changes in this PR. I'll update this PR.

@galipremsagar galipremsagar requested a review from vuule February 9, 2021 23:11
@galipremsagar
Copy link
Contributor Author

LGTM, but it's kind of hard to understand why some changes were made (without the detailed description).

Updated the code-changes with comments, this is ready for a re-review.

python/cudf/cudf/_fuzz_testing/orc.py Outdated Show resolved Hide resolved
python/cudf/cudf/_fuzz_testing/orc.py Outdated Show resolved Hide resolved
python/cudf/cudf/_fuzz_testing/tests/fuzz_test_orc.py Outdated Show resolved Hide resolved
python/cudf/cudf/_fuzz_testing/tests/fuzz_test_orc.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 10, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@c0282e6). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7357   +/-   ##
==============================================
  Coverage               ?   82.21%           
==============================================
  Files                  ?      100           
  Lines                  ?    16971           
  Branches               ?        0           
==============================================
  Hits                   ?    13953           
  Misses                 ?     3018           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0282e6...b5f8165. Read the comment docs.

Co-authored-by: Ram (Ramakrishna Prabhu) <[email protected]>
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a typo, LGTM otherwise

python/cudf/cudf/_fuzz_testing/utils.py Outdated Show resolved Hide resolved
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuIO Reviewer labels Feb 11, 2021
@galipremsagar
Copy link
Contributor Author

@gpucibot merge

@vuule vuule removed the 3 - Ready for Review Ready for review by team label Feb 11, 2021
@rapids-bot rapids-bot bot merged commit ebe307e into rapidsai:branch-0.19 Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants