Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avro reader integration tests #7156
avro reader integration tests #7156
Changes from 5 commits
0e7547b
f412ab7
2e47499
8f1f842
a63f0a5
c7d18b5
af6a966
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the dataframe shape (1,2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected
andactual
are the same shape. I don't know what shape that should be.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have some tests with a large number of rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test a large number of values,. It would be nice to have a test data generator. I see we're generating random values for fuzz testing. Are we able to do that in a deterministic manner so it can be also be used for unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the data generator optionally takes a seed value; that the output is deterministic for each seed. CC @galipremsagar for pointer to the generator + sample use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are discussing having large rows, I'd recommend staying in <30 rows range to not slow down things in pytests by a lot as that would slow down in gpu CI too. If there is a bug that only reproduces for a large column scenarion then we can widen the test coverage for large columns, else I think fuzz tests should take care of large rows testing. For using the dataset generator, here is how we can use it:
Alternatively, There is also an existing API that also returns deterministic data with the same seed values that is widely used across our pytests:
https://github.com/rapidsai/cudf/blob/branch-0.18/python/cudf/cudf/datasets.py#L60
This is much simpler to use and fits the use-case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rather just change this test to be a list of values(
cudf_val
be length 5/10) instead of 1 value?