-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at adding testing for pylibcudf #15300
Merged
rapids-bot
merged 49 commits into
rapidsai:branch-24.06
from
vyasr:feat/pylibcudf_testing
Apr 1, 2024
Merged
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
c9a920a
Update developer guide
vyasr 0243360
Add some basic utilities for testing equality
vyasr 2deb817
Add a function to copy a column
vyasr cf10f9d
Add some tests of copying
vyasr 1a32411
Implement num_rows and num_columns for a table
vyasr 260ec20
Add test of gather and fix a number of minor issues
vyasr 9f2c2ff
Move pylibcudf tests to separate top-level directory
vyasr c7f86af
Add pytest.ini
vyasr fb576ed
Test new directory in pytests
vyasr ad70d26
Fix parameter name
vyasr a4e677d
Implement __eq__ for DataType
vyasr aef8395
Fix C++ doc
vyasr ecf8546
Add remaining tests so that every API is tested at least once
vyasr 6179315
Add missing case
vyasr 678e773
Test exception handling for gather
vyasr 6c67886
Add tests of scatter exception cases
vyasr 02e7d02
Update exceptions thrown by various C++ copying APIs.
vyasr ff0346a
Update all docs
vyasr 639b486
Add tests of the remaining error cases
vyasr 7b23085
Fix typo
vyasr fc9b2c7
Revert C++ changes
vyasr a91c57f
Merge remote-tracking branch 'upstream/branch-24.04' into feat/pylibc…
vyasr db97c3e
Fix compilation
vyasr e28333f
Merge remote-tracking branch 'upstream/branch-24.04' into feat/pylibc…
vyasr 1174d09
Switch testing utilities to use the new interop APIs and make sure th…
vyasr e3a0123
Remove parallelism in CI
vyasr a93681f
Apply suggestions from code review
vyasr 6381212
Fix style
vyasr fd87b0a
Merge remote-tracking branch 'upstream/branch-24.06' into feat/pylibc…
vyasr 593753a
Use cudf_raises wrapper to verify that exceptions are coming from lib…
vyasr 3a482a9
Also move all the pyarrow objects to fixtures
vyasr 9b4fd29
Implement scattering properly with arrow
vyasr 49c6e39
Remove hardcoded values from a few more tests
vyasr b10892e
Also standardize the mask
vyasr 8dcf320
Generalize tests to work for floats
vyasr 911027c
Add tests of strings
vyasr 7cc2310
Add tests of bool
vyasr 45cbc64
Add tests of lists
vyasr 452bf33
Remove skipped tests in favor of simple hardcoded results
vyasr f0c525a
Add tests of struct
vyasr 62337a2
Centralize the struct type
vyasr f83bd66
Some cleanup and reorganization of fixtures and helper functions
vyasr 78759d9
Add a note on testing nullable data
vyasr f94d548
Address PR feedback
vyasr cd27ab1
Merge remote-tracking branch 'upstream/branch-24.06' into feat/pylibc…
vyasr 3327d85
Some updates to the dev guide
vyasr 469b536
Remove erroneous error
vyasr fcd4a98
Address feedback
vyasr be6edb3
Merge branch 'branch-24.06' into feat/pylibcudf_testing
vyasr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include a word about testing over a range of types here? The rule I've been following is that if the type is an explicit argument, then all supported types should be tested. However many APIs are supported for a range of types even though the type isn't an argument - scatter works for ints, floats, etc. Do we need to test all of them? Is it okay to just test one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely, we should figure out how we want to treat these. I worry that trying to test the Cartesian product of all types will lead to a prohibitively large number of tests, but maybe that's OK as long as we keep the runtime per test minimal? I know you said you've got thousands in your binops PR and that's taking a couple of minutes. Have you tried following the guidelines on fixtures that I wrote up in this PR? If not, can you see if that makes much of a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One compromise I've gone with in the past is to test over a matrix of "kind" rather than the full matrix. For instance,
{"uint64", "int64", "float64", "string", "bool", "List", "Struct"}
. Might save a few orders of magnitude. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems like a reasonable compromise to me. That way we are probably guaranteed to hit every distinct code path.
Another way of thinking about it: what we discussed was that pylibcudf should be focused on testing every code path and ensuring that all the logic that isn't testing in libcudf gets tested. All of the internal bits of pylibcudf involved in things like converting columns from libcudf, for example, need to be tested for every type since there is definitely type-specific logic there around converting different types of buffers etc. However, I don't think we need to test every user-facing API for every type because most APIs don't differ at all for different types in pylibcudf, only in libcudf. IOW something like copying a column is identical for string and float columns in pylibcudf up to differences in libcudf.
Thinking this way does require us to be a more cognizant of possible divergences in pylibcudf though, and it opens up an increased possibility for future error if we create new code branches in the future. A possible compromise would be to pick a couple of modules that we know exercise the internals sufficiently and test them for all types, then for others only test a sufficient subset of types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with basically this compromise. The chosen types are now standardized via a session-scope fixture, which should make it much easier to conform to these decisions across the whole test suite.