-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor distinct with hashset-based algorithms #15984
Refactor distinct with hashset-based algorithms #15984
Conversation
Please post a benchmark showing how the performance is changed. Thanks. |
Benchmark comparision of distinct algorithm with ['static_map.json', 'static_set.json'] distinct[0] Tesla T4
distinct_list[0] Tesla T4
Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce_by_row
is instantiated of hash_reduce_by_row
which is shared across multiple TUs. You should not separate them into multiple functions taking different sets of parameters. That is very difficult to join them again after refactor is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Good to see the build time also reduced.
I thought that we agreed to keep the signature of |
I'm not sure what you mean by keeping the signature of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this on hold since we find more performance optimization to do based on offline discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approval and excited waiting for the next refactor work.
Further performance improvements using distinctRef Time = With
|
Type | NumRows | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|
bool | 10000 | 169.117 us | 32.86% | 203.653 us | 1.88% | 34.536 us | 20.42% | FAIL |
bool | 100000 | 190.318 us | 28.92% | 232.877 us | 5.05% | 42.559 us | 22.36% | FAIL |
bool | 1000000 | 294.805 us | 16.77% | 357.525 us | 7.37% | 62.720 us | 21.28% | FAIL |
bool | 10000000 | 1.414 ms | 6.62% | 1.676 ms | 9.63% | 261.566 us | 18.49% | FAIL |
I8 | 10000 | 150.012 us | 2.80% | 187.305 us | 29.75% | 37.293 us | 24.86% | FAIL |
I8 | 100000 | 161.144 us | 4.72% | 195.348 us | 5.06% | 34.204 us | 21.23% | FAIL |
I8 | 1000000 | 283.225 us | 1.66% | 330.493 us | 8.80% | 47.268 us | 16.69% | FAIL |
I8 | 10000000 | 1.418 ms | 2.14% | 1.671 ms | 2.87% | 252.325 us | 17.79% | FAIL |
I32 | 10000 | 149.916 us | 2.75% | 177.923 us | 2.51% | 28.006 us | 18.68% | FAIL |
I32 | 100000 | 165.040 us | 14.86% | 193.024 us | 2.05% | 27.985 us | 16.96% | FAIL |
I32 | 1000000 | 283.267 us | 1.57% | 327.740 us | 1.13% | 44.473 us | 15.70% | FAIL |
I32 | 10000000 | 1.466 ms | 2.07% | 1.721 ms | 1.94% | 255.151 us | 17.41% | FAIL |
I64 | 10000 | 150.541 us | 2.75% | 178.182 us | 4.02% | 27.642 us | 18.36% | FAIL |
I64 | 100000 | 158.679 us | 2.61% | 193.019 us | 2.71% | 34.340 us | 21.64% | FAIL |
I64 | 1000000 | 287.044 us | 2.76% | 345.719 us | 16.33% | 58.675 us | 20.44% | FAIL |
I64 | 10000000 | 1.551 ms | 1.90% | 1.830 ms | 5.21% | 278.500 us | 17.95% | FAIL |
F32 | 10000 | 151.368 us | 5.03% | 179.430 us | 5.58% | 28.062 us | 18.54% | FAIL |
F32 | 100000 | 169.741 us | 0.93% | 210.323 us | 4.70% | 40.582 us | 23.91% | FAIL |
F32 | 1000000 | 531.013 us | 4.36% | 590.247 us | 10.19% | 59.235 us | 11.16% | FAIL |
F32 | 10000000 | 7.422 ms | 0.16% | 7.640 ms | 0.29% | 217.326 us | 2.93% | FAIL |
cudf::timestamp_ms | 10000 | 150.855 us | 7.18% | 179.409 us | 4.51% | 28.554 us | 18.93% | FAIL |
cudf::timestamp_ms | 100000 | 158.212 us | 2.58% | 193.409 us | 2.16% | 35.198 us | 22.25% | FAIL |
cudf::timestamp_ms | 1000000 | 289.823 us | 2.29% | 336.280 us | 1.06% | 46.457 us | 16.03% | FAIL |
cudf::timestamp_ms | 10000000 | 1.581 ms | 1.64% | 1.823 ms | 1.33% | 242.283 us | 15.32% | FAIL |
distinct_list
[0] Tesla T4
Type | null_probability | ColumnSize | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|
I32 | 0 | 100000000 | 3.183 ms | 1.17% | 3.666 ms | 0.48% | 483.897 us | 15.20% | FAIL |
I32 | 0.1 | 100000000 | 3.632 ms | 1.40% | 4.199 ms | 0.99% | 567.874 us | 15.64% | FAIL |
cudf::list_view | 0 | 100000000 | 3.512 ms | 0.88% | 3.584 ms | 0.68% | 72.714 us | 2.07% | FAIL |
cudf::list_view | 0.1 | 100000000 | 4.124 ms | 0.91% | 4.206 ms | 1.78% | 81.445 us | 1.97% | FAIL |
Summary
- Total Matches: 28
- Pass (diff <= min_noise): 0
- Unknown (infinite noise): 0
- Failure (diff > min_noise): 28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! 🔥 🔥 🔥
Thank you!
/merge |
Description
Refactor distinct algorithm to use
cuco::static_set
.Checklist