Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve cudf tests for hash_partition #7819

Open
magnatelee opened this issue Apr 1, 2021 · 2 comments
Open

[FEA] Improve cudf tests for hash_partition #7819

magnatelee opened this issue Apr 1, 2021 · 2 comments
Labels
feature request New feature or request Python Affects Python cuDF API. tests Unit testing for project

Comments

@magnatelee
Copy link
Contributor

magnatelee commented Apr 1, 2021

Is your feature request related to a problem? Please describe.

As cudf::hash_partition now can use a hash function other than Murmur3 or take a custom seed value, I think the tests should be improved for better coverage.

Describe the solution you'd like

As described in this comment, two key properties of cudf::hash_partition are

  1. Output partitions are disjoint; and
  2. Rows having the same key are always assigned the same partition id, even when they appear in different dataframes (when the number of partitions is fixed).

We need to check these properties for some combinations of hash function types and seed values. Property 1 is checked in this Python test, but only with the default hash function and seed value, and I believe there are no tests for property 2 yet.

Additionally, we should remove this test. The test is comparing a hard-coded value with the result of hash_partition for a test input, but this is over-constraining, as hash_partition in general has no guarantee on the order in which the keys appear in the output partitions. If we later decide to change the row_hasher in any way, this test will start to fail.

@magnatelee magnatelee added feature request New feature or request Needs Triage Need team to review and classify labels Apr 1, 2021
@kkraus14 kkraus14 added tests Unit testing for project Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Apr 6, 2021
@github-actions
Copy link

github-actions bot commented May 6, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Feb 7, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API. tests Unit testing for project
Projects
Status: Todo
Development

No branches or pull requests

3 participants