[FEA] Improve cudf tests for hash_partition #7819
Labels
feature request
New feature or request
Python
Affects Python cuDF API.
tests
Unit testing for project
Is your feature request related to a problem? Please describe.
As
cudf::hash_partition
now can use a hash function other than Murmur3 or take a custom seed value, I think the tests should be improved for better coverage.Describe the solution you'd like
As described in this comment, two key properties of
cudf::hash_partition
areWe need to check these properties for some combinations of hash function types and seed values. Property 1 is checked in this Python test, but only with the default hash function and seed value, and I believe there are no tests for property 2 yet.
Additionally, we should remove this test. The test is comparing a hard-coded value with the result of
hash_partition
for a test input, but this is over-constraining, ashash_partition
in general has no guarantee on the order in which the keys appear in the output partitions. If we later decide to change therow_hasher
in any way, this test will start to fail.The text was updated successfully, but these errors were encountered: