Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/python benchmarking #11125
Feature/python benchmarking #11125
Changes from 22 commits
acdbebb
28ed6a8
7e3db91
af6882f
97c601c
55434af
97bbe3e
0dd018b
7c370f4
83ee6b7
c61ba9f
690577f
1025be4
5db4589
6d57c10
0521038
0d69a1f
b060ec7
ec4eaa7
fe2a3a0
dd01dd4
0183a51
28a6ea6
a9ecdf3
5e5e3d4
b57eee2
5f80061
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the type of random state used affect performance? If not, do we need all three benchmarks? Or is this really about test coverage (in which case it should be a test, not a benchmark)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance does matter for a couple of reasons, mostly around the fact that our sampling implementation is different for sampling rows vs columns and you actually can't use cupy arrays for column sampling (because it needs to happen on the host) but you could use either for sampling rows and there is a meaningful perf difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to be explicit here, especially since the value
6
is not parameterized bycols=6
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comparator? Less than? It takes a different code path in libcudf so might be worth adding.