Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specialized dispatch to improve occupancy for hash table operations in distinct join #16321

Closed
wants to merge 31 commits into from

Conversation

tgujar
Copy link
Contributor

@tgujar tgujar commented Jul 19, 2024

Description

Adds specialized dispatch for distinct hash join similar to that implemented for hash joins in #15700
Related issue: #15502

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Benchmark results on A100

# left_anti_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  90.861 us |      20.59% |  89.249 us |      15.24% | -1.612 us |  -1.77% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  87.558 us |       2.21% |  86.986 us |       1.98% | -0.572 us |  -0.65% |   PASS   |
|  I32  |     0      |  10000000   |     1000     | 886.446 us |       0.19% | 885.025 us |       0.21% | -1.422 us |  -0.16% |   PASS   |
|  I32  |     0      |   100000    |    100000    |  86.035 us |       1.99% |  85.454 us |       1.81% | -0.581 us |  -0.68% |   PASS   |
|  I32  |     0      |  10000000   |    100000    | 653.915 us |       0.26% | 655.172 us |       0.32% |  1.257 us |   0.19% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   2.799 ms |       0.09% |   2.801 ms |       0.12% |  2.004 us |   0.07% |   PASS   |
|  I32  |     1      |    1000     |     1000     |  68.102 us |       2.32% |  67.778 us |       2.20% | -0.325 us |  -0.48% |   PASS   |
|  I32  |     1      |   100000    |     1000     |  70.618 us |       2.55% |  70.743 us |       2.74% |  0.125 us |   0.18% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 441.475 us |       0.36% | 442.285 us |       0.38% |  0.810 us |   0.18% |   PASS   |
|  I32  |     1      |   100000    |    100000    |  71.036 us |       2.09% |  71.382 us |       2.57% |  0.346 us |   0.49% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 461.234 us |       0.47% | 461.322 us |       0.35% |  0.087 us |   0.02% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   1.204 ms |       0.19% |   1.204 ms |       0.17% |  0.461 us |   0.04% |   PASS   |
|  I64  |     0      |    1000     |     1000     |  70.584 us |       2.98% |  68.666 us |       2.56% | -1.918 us |  -2.72% |   FAIL   |
|  I64  |     0      |   100000    |     1000     |  74.031 us |       2.46% |  73.741 us |       2.10% | -0.290 us |  -0.39% |   PASS   |
|  I64  |     0      |  10000000   |     1000     | 618.971 us |       0.31% | 619.490 us |       0.36% |  0.519 us |   0.08% |   PASS   |
|  I64  |     0      |   100000    |    100000    |  86.887 us |       1.85% |  85.190 us |       1.77% | -1.696 us |  -1.95% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    | 672.806 us |       0.35% | 672.620 us |       0.28% | -0.185 us |  -0.03% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   2.945 ms |       0.08% |   2.947 ms |       0.11% |  1.202 us |   0.04% |   PASS   |
|  I64  |     1      |    1000     |     1000     |  65.508 us |       3.28% |  64.192 us |       2.48% | -1.316 us |  -2.01% |   PASS   |
|  I64  |     1      |   100000    |     1000     |  70.483 us |       2.29% |  70.188 us |       2.46% | -0.295 us |  -0.42% |   PASS   |
|  I64  |     1      |  10000000   |     1000     | 424.186 us |       0.42% | 423.917 us |       0.43% | -0.269 us |  -0.06% |   PASS   |
|  I64  |     1      |   100000    |    100000    |  73.489 us |       2.45% |  74.126 us |       2.39% |  0.637 us |   0.87% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 477.040 us |       0.43% | 475.089 us |       0.42% | -1.951 us |  -0.41% |   PASS   |
|  I64  |     1      |  10000000   |   10000000   |   1.235 ms |       0.16% |   1.236 ms |       0.17% |  0.659 us |   0.05% |   PASS   |

# left_semi_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  81.302 us |       2.04% |  81.153 us |       2.07% | -0.149 us |  -0.18% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  87.323 us |       1.76% |  86.329 us |       1.83% | -0.994 us |  -1.14% |   PASS   |
|  I32  |     0      |  10000000   |     1000     | 878.355 us |       0.27% | 876.896 us |       0.22% | -1.458 us |  -0.17% |   PASS   |
|  I32  |     0      |   100000    |    100000    |  85.744 us |       2.09% |  84.968 us |       1.73% | -0.776 us |  -0.90% |   PASS   |
|  I32  |     0      |  10000000   |    100000    | 645.618 us |       0.28% | 647.221 us |       0.29% |  1.603 us |   0.25% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   2.792 ms |       0.09% |   2.794 ms |       0.09% |  1.773 us |   0.06% |   PASS   |
|  I32  |     1      |    1000     |     1000     |  67.987 us |       2.26% |  67.615 us |       2.86% | -0.372 us |  -0.55% |   PASS   |
|  I32  |     1      |   100000    |     1000     |  69.538 us |       2.27% |  69.157 us |       1.91% | -0.382 us |  -0.55% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 426.350 us |       0.54% | 427.259 us |       0.40% |  0.909 us |   0.21% |   PASS   |
|  I32  |     1      |   100000    |    100000    |  70.346 us |       2.42% |  70.494 us |       2.38% |  0.148 us |   0.21% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 446.008 us |       0.36% | 446.114 us |       0.30% |  0.106 us |   0.02% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   1.191 ms |       0.10% |   1.190 ms |       0.20% | -0.100 us |  -0.01% |   PASS   |
|  I64  |     0      |    1000     |     1000     |  70.061 us |       2.37% |  69.812 us |       2.58% | -0.250 us |  -0.36% |   PASS   |
|  I64  |     0      |   100000    |     1000     |  73.619 us |       2.29% |  73.504 us |       2.24% | -0.115 us |  -0.16% |   PASS   |
|  I64  |     0      |  10000000   |     1000     | 611.605 us |       0.27% | 612.211 us |       0.27% |  0.607 us |   0.10% |   PASS   |
|  I64  |     0      |   100000    |    100000    |  86.798 us |       2.27% |  84.953 us |       2.10% | -1.845 us |  -2.13% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    | 665.336 us |       0.23% | 665.804 us |       0.30% |  0.468 us |   0.07% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   2.939 ms |       0.07% |   2.939 ms |       0.07% |  0.304 us |   0.01% |   PASS   |
|  I64  |     1      |    1000     |     1000     |  64.898 us |       2.44% |  64.169 us |       2.70% | -0.730 us |  -1.12% |   PASS   |
|  I64  |     1      |   100000    |     1000     |  68.835 us |       2.05% |  69.013 us |       2.39% |  0.177 us |   0.26% |   PASS   |
|  I64  |     1      |  10000000   |     1000     | 409.432 us |       0.35% | 409.412 us |       0.34% | -0.020 us |  -0.00% |   PASS   |
|  I64  |     1      |   100000    |    100000    |  72.624 us |       2.47% |  73.302 us |       2.34% |  0.678 us |   0.93% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 461.327 us |       0.38% | 461.408 us |       0.42% |  0.080 us |   0.02% |   PASS   |
|  I64  |     1      |  10000000   |   10000000   |   1.221 ms |       0.15% |   1.222 ms |       0.10% |  1.185 us |   0.10% |   PASS   |

# conditional_inner_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |   1.749 ms |       0.12% |   1.748 ms |       0.14% | -0.334 us |  -0.02% |   PASS   |
|  I32  |     0      |   100000    |     1000     |   3.766 ms |       0.05% |   3.766 ms |       0.04% |  0.192 us |   0.01% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 370.819 ms |       0.01% | 370.814 ms |       0.01% | -5.305 us |  -0.00% |   PASS   |
|  I32  |     1      |    1000     |     1000     |   2.290 ms |       0.04% |   2.289 ms |       0.10% | -1.277 us |  -0.06% |   FAIL   |
|  I32  |     1      |   100000    |     1000     |   4.934 ms |       0.08% |   4.930 ms |       0.06% | -4.096 us |  -0.08% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 487.309 ms |       0.00% | 487.320 ms |       0.00% | 11.830 us |   0.00% |   PASS   |
|  I64  |     0      |    1000     |     1000     |   1.841 ms |       0.08% |   1.833 ms |       0.12% | -8.264 us |  -0.45% |   FAIL   |
|  I64  |     0      |   100000    |     1000     |   3.909 ms |       0.06% |   3.903 ms |       0.09% | -6.165 us |  -0.16% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 382.522 ms |       0.02% | 382.567 ms |       0.01% | 44.772 us |   0.01% |   PASS   |
|  I64  |     1      |    1000     |     1000     |   2.414 ms |       0.07% |   2.413 ms |       0.12% | -0.822 us |  -0.03% |   PASS   |
|  I64  |     1      |   100000    |     1000     |   5.108 ms |       0.05% |   5.107 ms |       0.02% | -0.606 us |  -0.01% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 504.412 ms |       0.00% | 504.428 ms |       0.00% | 16.108 us |   0.00% |   FAIL   |

# conditional_left_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-----------|---------|----------|
|  I32  |     0      |    1000     |     1000     |   1.750 ms |       0.17% |   1.750 ms |       0.15% | -0.587 us |  -0.03% |   PASS   |
|  I32  |     0      |   100000    |     1000     |   3.770 ms |       0.05% |   3.770 ms |       0.03% | -0.331 us |  -0.01% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 370.814 ms |       0.01% | 370.824 ms |       0.01% | 10.151 us |   0.00% |   PASS   |
|  I32  |     1      |    1000     |     1000     |   2.294 ms |       0.18% |   2.292 ms |       0.09% | -1.361 us |  -0.06% |   PASS   |
|  I32  |     1      |   100000    |     1000     |   4.937 ms |       0.08% |   4.936 ms |       0.08% | -1.205 us |  -0.02% |   PASS   |
|  I32  |     1      |   100000    |    100000    | 487.325 ms |       0.00% | 487.332 ms |       0.01% |  7.077 us |   0.00% |   PASS   |
|  I64  |     0      |    1000     |     1000     |   1.843 ms |       0.12% |   1.835 ms |       0.12% | -8.598 us |  -0.47% |   FAIL   |
|  I64  |     0      |   100000    |     1000     |   3.914 ms |       0.06% |   3.908 ms |       0.10% | -5.224 us |  -0.13% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 382.534 ms |       0.02% | 382.554 ms |       0.02% | 19.825 us |   0.01% |   PASS   |
|  I64  |     1      |    1000     |     1000     |   2.417 ms |       0.08% |   2.416 ms |       0.14% | -0.663 us |  -0.03% |   PASS   |
|  I64  |     1      |   100000    |     1000     |   5.113 ms |       0.03% |   5.113 ms |       0.04% | -0.178 us |  -0.00% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 504.425 ms |       0.00% | 504.424 ms |       0.01% | -1.298 us |  -0.00% |   PASS   |

# inner_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  97.271 us |       2.58% |  97.615 us |       2.99% |     0.344 us |   0.35% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 124.120 us |       3.07% | 115.016 us |       2.02% |    -9.104 us |  -7.33% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.057 ms |       0.37% |   2.143 ms |       0.46% |  -914.120 us | -29.91% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 143.238 us |       1.42% | 132.162 us |       1.77% |   -11.076 us |  -7.73% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.638 ms |       0.12% |   2.525 ms |       0.12% | -1113.183 us | -30.60% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.688 ms |       0.07% |   5.152 ms |       0.07% | -1535.986 us | -22.96% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 107.932 us |       4.32% | 106.245 us |       3.76% |    -1.688 us |  -1.56% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 121.986 us |       2.53% | 121.765 us |       2.47% |    -0.221 us |  -0.18% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   1.978 ms |       0.24% |   1.339 ms |       0.35% |  -639.047 us | -32.31% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 130.398 us |       3.33% | 130.005 us |       3.51% |    -0.394 us |  -0.30% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   2.172 ms |       0.23% |   1.472 ms |       0.22% |  -699.714 us | -32.22% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.280 ms |       0.17% |   2.258 ms |       0.25% | -1021.737 us | -31.15% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 102.598 us |       3.99% |  96.301 us |       3.69% |    -6.297 us |  -6.14% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 129.971 us |       2.60% | 114.988 us |       1.67% |   -14.983 us | -11.53% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.149 ms |       0.50% |   2.200 ms |       0.49% |  -949.257 us | -30.14% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 144.244 us |       1.66% | 135.038 us |       3.12% |    -9.207 us |  -6.38% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   3.712 ms |       0.12% |   2.569 ms |       0.12% | -1142.628 us | -30.78% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   6.785 ms |       0.06% |   5.218 ms |       0.10% | -1567.131 us | -23.10% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 107.626 us |       4.25% | 105.057 us |       2.03% |    -2.569 us |  -2.39% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 124.349 us |       2.91% | 122.234 us |       3.00% |    -2.115 us |  -1.70% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.018 ms |       0.23% |   1.381 ms |       0.21% |  -637.102 us | -31.57% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 133.511 us |       2.89% | 132.084 us |       2.56% |    -1.427 us |  -1.07% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   2.226 ms |       0.23% |   1.510 ms |       0.39% |  -716.524 us | -32.19% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.349 ms |       0.14% |   2.315 ms |       0.22% | -1033.552 us | -30.86% |   FAIL   |

# left_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  97.286 us |       2.64% |  97.733 us |       2.88% |     0.447 us |   0.46% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 127.376 us |       3.04% | 114.851 us |       1.96% |   -12.525 us |  -9.83% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.241 ms |       0.42% |   2.243 ms |       0.45% |  -997.955 us | -30.80% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 142.978 us |       1.37% | 132.073 us |       1.88% |   -10.905 us |  -7.63% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.823 ms |       0.39% |   2.626 ms |       0.11% | -1197.748 us | -31.33% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.891 ms |       0.06% |   5.283 ms |       0.09% | -1607.503 us | -23.33% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 107.920 us |       4.07% | 106.214 us |       3.78% |    -1.705 us |  -1.58% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 122.137 us |       2.83% | 121.905 us |       2.34% |    -0.232 us |  -0.19% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.145 ms |       0.24% |   1.465 ms |       0.33% |  -679.381 us | -31.68% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 134.669 us |       4.29% | 129.665 us |       3.42% |    -5.004 us |  -3.72% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   2.326 ms |       0.22% |   1.590 ms |       0.24% |  -736.413 us | -31.65% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.466 ms |       0.15% |   2.403 ms |       0.22% | -1062.442 us | -30.66% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 104.977 us |       2.52% |  97.283 us |       2.83% |    -7.694 us |  -7.33% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 131.580 us |       1.63% | 115.086 us |       1.88% |   -16.494 us | -12.54% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.346 ms |       0.46% |   2.309 ms |       0.50% | -1036.767 us | -30.99% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 147.217 us |       2.94% | 136.990 us |       3.16% |   -10.227 us |  -6.95% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   3.903 ms |       0.13% |   2.677 ms |       0.17% | -1225.625 us | -31.40% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   6.990 ms |       0.06% |   5.346 ms |       0.08% | -1644.113 us | -23.52% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 113.850 us |       5.18% | 105.199 us |       2.44% |    -8.650 us |  -7.60% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 123.689 us |       2.63% | 122.590 us |       2.40% |    -1.100 us |  -0.89% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.188 ms |       0.27% |   1.509 ms |       0.25% |  -678.680 us | -31.02% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 135.824 us |       4.21% | 132.758 us |       5.97% |    -3.067 us |  -2.26% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   2.393 ms |       0.19% |   1.634 ms |       0.30% |  -758.371 us | -31.70% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.535 ms |       0.16% |   2.461 ms |       0.19% | -1074.032 us | -30.38% |   FAIL   |

# full_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 173.354 us |       2.30% | 173.201 us |       2.37% |    -0.153 us |  -0.09% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 174.440 us |       2.29% | 160.824 us |       1.64% |   -13.615 us |  -7.81% |   FAIL   |
|  I32  |     0      |  10000000   |     1000     |   3.934 ms |       0.34% |   2.921 ms |       0.40% | -1013.110 us | -25.75% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 223.671 us |       1.52% | 212.643 us |       1.60% |   -11.028 us |  -4.93% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   3.961 ms |       0.12% |   2.768 ms |       0.12% | -1192.986 us | -30.12% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   7.570 ms |       0.06% |   5.963 ms |       0.10% | -1606.246 us | -21.22% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 183.325 us |       2.92% | 183.825 us |       2.64% |     0.500 us |   0.27% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 198.772 us |       2.30% | 198.998 us |       2.95% |     0.226 us |   0.11% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.427 ms |       0.21% |   1.727 ms |       0.23% |  -700.032 us | -28.84% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 214.070 us |       2.81% | 210.944 us |       2.29% |    -3.126 us |  -1.46% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   2.570 ms |       0.25% |   1.833 ms |       0.26% |  -737.572 us | -28.70% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.968 ms |       0.13% |   2.906 ms |       0.21% | -1061.751 us | -26.76% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 180.608 us |       1.70% | 173.500 us |       2.28% |    -7.108 us |  -3.94% |   FAIL   |
|  I64  |     0      |   100000    |     1000     | 178.070 us |       1.69% | 162.699 us |       1.70% |   -15.371 us |  -8.63% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   3.871 ms |       0.42% |   2.822 ms |       0.43% | -1048.670 us | -27.09% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 227.515 us |       2.01% | 217.257 us |       2.16% |   -10.258 us |  -4.51% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   4.036 ms |       0.12% |   2.807 ms |       0.16% | -1229.132 us | -30.45% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.672 ms |       0.06% |   6.028 ms |       0.10% | -1644.502 us | -21.43% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 192.768 us |       3.07% | 181.239 us |       2.30% |   -11.529 us |  -5.98% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 204.289 us |       2.60% | 199.611 us |       1.94% |    -4.678 us |  -2.29% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     |   2.449 ms |       0.30% |   1.772 ms |       0.24% |  -676.662 us | -27.63% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 219.494 us |       2.30% | 209.634 us |       2.40% |    -9.860 us |  -4.49% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   2.635 ms |       0.19% |   1.876 ms |       0.31% |  -759.309 us | -28.81% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.038 ms |       0.12% |   2.964 ms |       0.20% | -1074.184 us | -26.60% |   FAIL   |

# mixed_inner_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 160.030 us |       1.95% | 160.284 us |       2.25% |   0.254 us |   0.16% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 188.648 us |       1.96% | 190.244 us |       2.13% |   1.596 us |   0.85% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   4.239 ms |       0.39% |   4.287 ms |       0.59% |  47.841 us |   1.13% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 214.583 us |       1.53% | 215.578 us |       1.65% |   0.995 us |   0.46% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.252 ms |       0.12% |   5.252 ms |       0.11% |  -0.437 us |  -0.01% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   9.177 ms |       0.05% |   9.161 ms |       0.07% | -15.788 us |  -0.17% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 167.511 us |       2.70% | 165.020 us |       2.01% |  -2.491 us |  -1.49% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 185.776 us |       2.23% | 185.687 us |       1.95% |  -0.089 us |  -0.05% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.732 ms |       0.14% |   2.728 ms |       0.15% |  -3.846 us |  -0.14% |   PASS   |
|  I32  |     1      |   100000    |    100000    | 194.742 us |       2.12% | 195.599 us |       2.06% |   0.857 us |   0.44% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   3.124 ms |       0.19% |   3.120 ms |       0.15% |  -3.314 us |  -0.11% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   4.286 ms |       0.10% |   4.266 ms |       0.11% | -19.250 us |  -0.45% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 174.238 us |       1.74% | 172.663 us |       2.10% |  -1.575 us |  -0.90% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 212.796 us |       2.03% | 209.583 us |       2.66% |  -3.212 us |  -1.51% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   4.789 ms |       0.50% |   4.724 ms |       0.41% | -65.083 us |  -1.36% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 226.501 us |       1.38% | 227.054 us |       1.08% |   0.553 us |   0.24% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.527 ms |       0.09% |   5.527 ms |       0.09% |   0.371 us |   0.01% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   9.372 ms |       0.06% |   9.356 ms |       0.06% | -16.289 us |  -0.17% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 160.433 us |       3.63% | 168.567 us |       4.15% |   8.135 us |   5.07% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 193.922 us |       2.85% | 191.455 us |       2.58% |  -2.467 us |  -1.27% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.817 ms |       0.25% |   2.813 ms |       0.19% |  -4.090 us |  -0.15% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 202.726 us |       2.58% | 198.398 us |       2.81% |  -4.327 us |  -2.13% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.258 ms |       0.16% |   3.252 ms |       0.15% |  -5.346 us |  -0.16% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.396 ms |       0.12% |   4.365 ms |       0.09% | -31.525 us |  -0.72% |   FAIL   |

# mixed_left_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 161.267 us |       1.94% | 161.889 us |       3.18% |   0.621 us |   0.39% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 193.725 us |       2.15% | 196.397 us |       1.98% |   2.672 us |   1.38% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   4.354 ms |       0.42% |   4.405 ms |       0.60% |  50.215 us |   1.15% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 216.069 us |       2.12% | 220.213 us |       2.36% |   4.143 us |   1.92% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.380 ms |       0.09% |   5.381 ms |       0.12% |   1.244 us |   0.02% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |   9.342 ms |       0.05% |   9.326 ms |       0.07% | -16.232 us |  -0.17% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 166.913 us |       2.86% | 165.996 us |       2.52% |  -0.917 us |  -0.55% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 185.537 us |       2.11% | 185.325 us |       2.10% |  -0.211 us |  -0.11% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   2.812 ms |       0.17% |   2.810 ms |       0.17% |  -1.829 us |  -0.07% |   PASS   |
|  I32  |     1      |   100000    |    100000    | 195.091 us |       1.96% | 195.943 us |       1.83% |   0.853 us |   0.44% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   3.201 ms |       0.16% |   3.198 ms |       0.15% |  -3.326 us |  -0.10% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   4.364 ms |       0.13% |   4.344 ms |       0.10% | -20.540 us |  -0.47% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 172.998 us |       2.08% | 171.628 us |       1.73% |  -1.370 us |  -0.79% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 214.295 us |       1.65% | 212.494 us |       2.02% |  -1.801 us |  -0.84% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   4.889 ms |       0.45% |   4.832 ms |       0.48% | -57.071 us |  -1.17% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 226.001 us |       1.25% | 227.505 us |       1.35% |   1.503 us |   0.67% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.634 ms |       0.09% |   5.634 ms |       0.08% |   0.541 us |   0.01% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |   9.520 ms |       0.06% |   9.504 ms |       0.07% | -16.152 us |  -0.17% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 163.356 us |       3.74% | 169.715 us |       4.11% |   6.360 us |   3.89% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 193.651 us |       2.17% | 191.176 us |       2.69% |  -2.475 us |  -1.28% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   2.886 ms |       0.18% |   2.884 ms |       0.16% |  -2.189 us |  -0.08% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 200.054 us |       2.56% | 201.958 us |       2.71% |   1.904 us |   0.95% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.334 ms |       0.14% |   3.328 ms |       0.14% |  -5.626 us |  -0.17% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.471 ms |       0.12% |   4.439 ms |       0.10% | -32.546 us |  -0.73% |   FAIL   |

# mixed_full_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 243.211 us |       2.75% | 242.023 us |       2.45% |  -1.188 us |  -0.49% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 248.801 us |       1.46% | 248.462 us |       1.70% |  -0.339 us |  -0.14% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   5.025 ms |       0.38% |   4.928 ms |       0.54% | -97.321 us |  -1.94% |   FAIL   |
|  I32  |     0      |   100000    |    100000    | 302.615 us |       2.39% | 306.433 us |       1.86% |   3.819 us |   1.26% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   5.533 ms |       0.10% |   5.535 ms |       0.13% |   2.410 us |   0.04% |   PASS   |
|  I32  |     0      |  10000000   |   10000000   |  10.027 ms |       0.04% |  10.012 ms |       0.06% | -15.196 us |  -0.15% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 251.547 us |       2.50% | 247.515 us |       2.52% |  -4.032 us |  -1.60% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 268.783 us |       2.01% | 268.158 us |       1.61% |  -0.625 us |  -0.23% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   3.119 ms |       0.22% |   3.091 ms |       0.21% | -28.292 us |  -0.91% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 281.557 us |       2.55% | 281.213 us |       1.61% |  -0.345 us |  -0.12% |   PASS   |
|  I32  |     1      |  10000000   |    100000    |   3.450 ms |       0.17% |   3.448 ms |       0.18% |  -2.170 us |  -0.06% |   PASS   |
|  I32  |     1      |  10000000   |   10000000   |   4.872 ms |       0.13% |   4.852 ms |       0.11% | -19.853 us |  -0.41% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 253.461 us |       1.36% | 253.491 us |       2.18% |   0.029 us |   0.01% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 265.404 us |       1.50% | 268.120 us |       1.85% |   2.716 us |   1.02% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   5.424 ms |       0.39% |   5.355 ms |       0.49% | -68.675 us |  -1.27% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 312.536 us |       1.45% | 313.309 us |       1.33% |   0.774 us |   0.25% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   5.782 ms |       0.09% |   5.784 ms |       0.10% |   1.641 us |   0.03% |   PASS   |
|  I64  |     0      |  10000000   |   10000000   |  10.203 ms |       0.05% |  10.191 ms |       0.07% | -11.858 us |  -0.12% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 244.506 us |       2.61% | 253.241 us |       3.06% |   8.735 us |   3.57% |   FAIL   |
|  I64  |     1      |   100000    |     1000     | 279.486 us |       2.16% | 274.517 us |       2.19% |  -4.970 us |  -1.78% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   3.159 ms |       0.19% |   3.188 ms |       0.16% |  29.415 us |   0.93% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 288.049 us |       1.91% | 287.491 us |       2.18% |  -0.558 us |  -0.19% |   PASS   |
|  I64  |     1      |  10000000   |    100000    |   3.584 ms |       0.17% |   3.578 ms |       0.15% |  -6.451 us |  -0.18% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.978 ms |       0.15% |   4.949 ms |       0.12% | -29.140 us |  -0.59% |   FAIL   |

# mixed_left_semi_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 141.934 us |       1.88% | 142.680 us |       1.99% |    0.746 us |   0.53% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 165.387 us |       1.79% | 166.094 us |       2.16% |    0.707 us |   0.43% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.870 ms |       0.13% |   1.873 ms |       0.21% |    2.178 us |   0.12% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 194.246 us |       1.46% | 187.360 us |       1.57% |   -6.886 us |  -3.55% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   2.192 ms |       0.14% |   2.182 ms |       0.10% |   -9.673 us |  -0.44% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.977 ms |       0.11% |   6.060 ms |       0.07% | -916.461 us | -13.14% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 159.159 us |       2.26% | 158.990 us |       2.44% |   -0.169 us |  -0.11% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 177.639 us |       1.96% | 177.302 us |       2.25% |   -0.337 us |  -0.19% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   1.453 ms |       0.29% |   1.427 ms |       0.34% |  -25.970 us |  -1.79% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 201.778 us |       2.36% | 192.236 us |       2.36% |   -9.542 us |  -4.73% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   1.498 ms |       0.37% |   1.486 ms |       0.26% |  -12.424 us |  -0.83% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.600 ms |       0.07% |   3.625 ms |       0.09% | -974.543 us | -21.19% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 143.884 us |       1.93% | 144.360 us |       2.07% |    0.475 us |   0.33% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 169.534 us |       2.48% | 168.538 us |       1.35% |   -0.996 us |  -0.59% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   2.043 ms |       0.14% |   2.055 ms |       0.17% |   11.345 us |   0.56% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 205.075 us |       1.72% | 190.720 us |       1.47% |  -14.356 us |  -7.00% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   2.344 ms |       0.13% |   2.319 ms |       0.17% |  -25.242 us |  -1.08% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.229 ms |       0.07% |   6.349 ms |       0.06% | -880.105 us | -12.18% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 161.782 us |       2.80% | 160.036 us |       3.14% |   -1.746 us |  -1.08% |   PASS   |
|  I64  |     1      |   100000    |     1000     | 177.773 us |       2.40% | 177.219 us |       1.96% |   -0.554 us |  -0.31% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   1.394 ms |       0.31% |   1.391 ms |       0.22% |   -2.836 us |  -0.20% |   PASS   |
|  I64  |     1      |   100000    |    100000    | 201.791 us |       2.22% | 193.170 us |       2.39% |   -8.622 us |  -4.27% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   1.540 ms |       0.34% |   1.526 ms |       0.28% |  -14.073 us |  -0.91% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.698 ms |       0.08% |   3.708 ms |       0.09% | -989.718 us | -21.07% |   FAIL   |

# mixed_left_anti_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-------------|---------|----------|
|  I32  |     0      |    1000     |     1000     | 142.733 us |       2.24% | 143.212 us |       2.33% |    0.479 us |   0.34% |   PASS   |
|  I32  |     0      |   100000    |     1000     | 165.283 us |       1.84% | 165.518 us |       1.36% |    0.235 us |   0.14% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.878 ms |       0.14% |   1.880 ms |       0.17% |    1.920 us |   0.10% |   PASS   |
|  I32  |     0      |   100000    |    100000    | 199.202 us |       6.84% | 187.727 us |       1.40% |  -11.474 us |  -5.76% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    |   2.200 ms |       0.15% |   2.190 ms |       0.11% |   -9.899 us |  -0.45% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   6.984 ms |       0.07% |   6.068 ms |       0.06% | -916.084 us | -13.12% |   FAIL   |
|  I32  |     1      |    1000     |     1000     | 159.451 us |       2.32% | 159.774 us |       2.65% |    0.323 us |   0.20% |   PASS   |
|  I32  |     1      |   100000    |     1000     | 178.824 us |       2.03% | 178.274 us |       2.27% |   -0.551 us |  -0.31% |   PASS   |
|  I32  |     1      |  10000000   |     1000     |   1.464 ms |       0.27% |   1.440 ms |       0.29% |  -24.507 us |  -1.67% |   FAIL   |
|  I32  |     1      |   100000    |    100000    | 202.361 us |       2.23% | 193.803 us |       2.43% |   -8.558 us |  -4.23% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    |   1.510 ms |       0.31% |   1.498 ms |       0.26% |  -11.898 us |  -0.79% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   4.610 ms |       0.08% |   3.637 ms |       0.14% | -972.996 us | -21.10% |   FAIL   |
|  I64  |     0      |    1000     |     1000     | 144.084 us |       1.86% | 145.383 us |       2.32% |    1.299 us |   0.90% |   PASS   |
|  I64  |     0      |   100000    |     1000     | 170.368 us |       2.52% | 168.732 us |       1.56% |   -1.635 us |  -0.96% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   2.052 ms |       0.14% |   2.062 ms |       0.16% |   10.864 us |   0.53% |   FAIL   |
|  I64  |     0      |   100000    |    100000    | 205.602 us |       1.69% | 191.134 us |       1.33% |  -14.468 us |  -7.04% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   2.353 ms |       0.18% |   2.326 ms |       0.10% |  -27.201 us |  -1.16% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   7.236 ms |       0.05% |   6.356 ms |       0.07% | -880.328 us | -12.17% |   FAIL   |
|  I64  |     1      |    1000     |     1000     | 164.037 us |       2.87% | 160.285 us |       2.61% |   -3.752 us |  -2.29% |   PASS   |
|  I64  |     1      |   100000    |     1000     | 181.190 us |       2.51% | 177.684 us |       2.02% |   -3.506 us |  -1.93% |   PASS   |
|  I64  |     1      |  10000000   |     1000     |   1.408 ms |       0.32% |   1.404 ms |       0.26% |   -4.183 us |  -0.30% |   FAIL   |
|  I64  |     1      |   100000    |    100000    | 203.445 us |       2.09% | 192.940 us |       2.39% |  -10.504 us |  -5.16% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    |   1.553 ms |       0.35% |   1.538 ms |       0.31% |  -14.199 us |  -0.91% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   4.709 ms |       0.09% |   3.719 ms |       0.11% | -990.099 us | -21.02% |   FAIL   |

# distinct_inner_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-------------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  72.897 us |       4.53% |  72.500 us |       4.44% |   -0.397 us |  -0.54% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  73.723 us |       2.13% |  74.753 us |       2.39% |    1.031 us |   1.40% |   PASS   |
|  I32  |     0      |  10000000   |     1000     | 975.389 us |       0.12% | 798.210 us |       0.27% | -177.178 us | -18.16% |   FAIL   |
|  I32  |     0      |   100000    |    100000    |  90.265 us |       2.99% |  83.621 us |       2.97% |   -6.644 us |  -7.36% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    | 984.498 us |       0.24% | 757.926 us |       0.23% | -226.571 us | -23.01% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   3.795 ms |       0.11% |   3.388 ms |       0.11% | -406.891 us | -10.72% |   FAIL   |
|  I32  |     1      |    1000     |     1000     |  78.160 us |       4.31% |  72.810 us |       5.91% |   -5.349 us |  -6.84% |   FAIL   |
|  I32  |     1      |   100000    |     1000     |  80.601 us |       5.37% |  79.018 us |       4.68% |   -1.583 us |  -1.96% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 497.193 us |       0.78% | 378.630 us |       1.07% | -118.563 us | -23.85% |   FAIL   |
|  I32  |     1      |   100000    |    100000    |  84.146 us |       4.03% |  84.917 us |       4.06% |    0.771 us |   0.92% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 558.863 us |       0.73% | 430.181 us |       0.89% | -128.682 us | -23.03% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   1.229 ms |       0.26% |   1.053 ms |       0.39% | -175.856 us | -14.31% |   FAIL   |
|  I64  |     0      |    1000     |     1000     |  66.922 us |       5.18% |  66.570 us |       2.51% |   -0.351 us |  -0.53% |   PASS   |
|  I64  |     0      |   100000    |     1000     |  73.568 us |       1.94% |  74.024 us |       3.05% |    0.456 us |   0.62% |   PASS   |
|  I64  |     0      |  10000000   |     1000     |   1.016 ms |       0.31% | 838.589 us |       0.22% | -177.703 us | -17.49% |   FAIL   |
|  I64  |     0      |   100000    |    100000    |  88.798 us |       3.96% |  84.272 us |       2.29% |   -4.526 us |  -5.10% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    |   1.009 ms |       0.12% | 783.589 us |       0.35% | -225.247 us | -22.33% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   3.836 ms |       0.11% |   3.432 ms |       0.12% | -404.110 us | -10.54% |   FAIL   |
|  I64  |     1      |    1000     |     1000     |  78.431 us |       4.55% |  70.749 us |       4.79% |   -7.681 us |  -9.79% |   FAIL   |
|  I64  |     1      |   100000    |     1000     |  81.524 us |       5.26% |  77.902 us |       3.40% |   -3.622 us |  -4.44% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     | 515.397 us |       0.80% | 394.449 us |       0.74% | -120.948 us | -23.47% |   FAIL   |
|  I64  |     1      |   100000    |    100000    |  84.532 us |       4.73% |  84.206 us |       4.41% |   -0.325 us |  -0.38% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 574.291 us |       0.75% | 443.097 us |       0.91% | -131.194 us | -22.84% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   1.255 ms |       0.27% |   1.075 ms |       0.33% | -179.099 us | -14.28% |   FAIL   |

# distinct_left_join

## [0] NVIDIA A100-PCIE-40GB

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|-------------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  53.852 us |       2.41% |  51.894 us |       2.45% |   -1.958 us |  -3.64% |   FAIL   |
|  I32  |     0      |   100000    |     1000     |  56.630 us |       2.21% |  55.920 us |       2.55% |   -0.711 us |  -1.26% |   PASS   |
|  I32  |     0      |  10000000   |     1000     | 776.903 us |       0.17% | 649.909 us |       0.31% | -126.994 us | -16.35% |   FAIL   |
|  I32  |     0      |   100000    |    100000    |  71.228 us |       1.83% |  67.705 us |       1.98% |   -3.523 us |  -4.95% |   FAIL   |
|  I32  |     0      |  10000000   |    100000    | 756.068 us |       0.20% | 549.715 us |       0.25% | -206.353 us | -27.29% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |   3.483 ms |       0.08% |   3.260 ms |       0.09% | -223.137 us |  -6.41% |   FAIL   |
|  I32  |     1      |    1000     |     1000     |  58.965 us |       4.88% |  56.248 us |       4.42% |   -2.717 us |  -4.61% |   FAIL   |
|  I32  |     1      |   100000    |     1000     |  62.480 us |       6.63% |  58.621 us |       5.62% |   -3.859 us |  -6.18% |   FAIL   |
|  I32  |     1      |  10000000   |     1000     | 322.612 us |       0.81% | 244.058 us |       0.89% |  -78.554 us | -24.35% |   FAIL   |
|  I32  |     1      |   100000    |    100000    |  68.750 us |       4.93% |  67.299 us |       4.99% |   -1.452 us |  -2.11% |   PASS   |
|  I32  |     1      |  10000000   |    100000    | 389.208 us |       1.03% | 297.101 us |       1.36% |  -92.108 us | -23.67% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   1.054 ms |       0.28% | 942.088 us |       0.29% | -111.414 us | -10.58% |   FAIL   |
|  I64  |     0      |    1000     |     1000     |  51.114 us |       2.32% |  51.000 us |       2.70% |   -0.114 us |  -0.22% |   PASS   |
|  I64  |     0      |   100000    |     1000     |  55.930 us |       2.19% |  53.552 us |       2.56% |   -2.378 us |  -4.25% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     | 803.365 us |       0.16% | 678.371 us |       0.23% | -124.994 us | -15.56% |   FAIL   |
|  I64  |     0      |   100000    |    100000    |  69.472 us |       1.88% |  67.794 us |       1.73% |   -1.678 us |  -2.42% |   FAIL   |
|  I64  |     0      |  10000000   |    100000    | 772.276 us |       0.17% | 569.569 us |       0.40% | -202.707 us | -26.25% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |   3.516 ms |       0.08% |   3.297 ms |       0.14% | -219.684 us |  -6.25% |   FAIL   |
|  I64  |     1      |    1000     |     1000     |  59.470 us |       6.65% |  56.124 us |       3.75% |   -3.346 us |  -5.63% |   FAIL   |
|  I64  |     1      |   100000    |     1000     |  63.261 us |       6.57% |  57.948 us |       3.85% |   -5.313 us |  -8.40% |   FAIL   |
|  I64  |     1      |  10000000   |     1000     | 340.114 us |       0.78% | 261.181 us |       0.88% |  -78.933 us | -23.21% |   FAIL   |
|  I64  |     1      |   100000    |    100000    |  69.709 us |       4.46% |  67.579 us |       5.30% |   -2.130 us |  -3.06% |   PASS   |
|  I64  |     1      |  10000000   |    100000    | 399.965 us |       1.04% | 307.076 us |       1.34% |  -92.889 us | -23.22% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   1.077 ms |       0.23% | 964.012 us |       0.27% | -113.335 us | -10.52% |   FAIL   |

# Summary

- Total Matches: 312
  - Pass    (diff <= min_noise): 153
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  159

Copy link

copy-pr-bot bot commented Jul 19, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 19, 2024
@github-actions github-actions bot added the CMake CMake build issue label Jul 19, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a draft PR but I wanted to understand this work since it has come up in a few conversations. I left a few comments, and hope they are helpful.

probing_scheme_type,
cudf::detail::cuco_allocator,
cuco_storage_type>;
using hash_table_type = std::variant<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this with something like:

template<typename Comparator>
using static_set_with_comparator = cuco::static_set<cuco::pair<hash_value_type, rhs_index_type>,
                                    cuco::extent<size_type>,
                                    cuda::thread_scope_device,
                                    comparator_adapter<cudf::experimental::row::equality::
                                          strong_index_comparator_adapter<Comparator>>,
                                    probing_scheme_type,
                                    cudf::detail::cuco_allocator,
                                    cuco_storage_type>;

using hash_table_type = std::variant<
    static_set_with_comparator<row_comparator>,
    static_set_with_comparator<row_comparator_no_nested>,
    static_set_with_comparator<row_comparator_no_compound>>;

/**
* @brief Compares the specified elements for equality.
*
* is_equality_comparable differs from implementation for std::equality_comparable and considers
* void as and equality comparable type. Thus we need to disable this for when type is void.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* void as and equality comparable type. Thus we need to disable this for when type is void.
* void as an equality comparable type. Thus we need to disable this for when type is void.

return std::visit(
[&](auto& comparator) {
return ret_type{
std::in_place_type<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use in_place_type_t and drop the ::type?

cuco::static_set<cuco::pair<hash_value_type, rhs_index_type>,
cuco::extent<size_type>,
cuda::thread_scope_device,
typename std::remove_reference<decltype(comparator_adapter)>::type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typename std::remove_reference<decltype(comparator_adapter)>::type,
typename std::remove_reference_t<decltype(comparator_adapter)>,

[&](auto&& hasher, auto&& hash_table) {
auto const iter = cudf::detail::make_counting_transform_iterator(
0,
build_keys_fn<typename std::remove_reference<decltype(hasher)>::type, rhs_index_type>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
build_keys_fn<typename std::remove_reference<decltype(hasher)>::type, rhs_index_type>{
build_keys_fn<typename std::remove_reference_t<decltype(hasher)>, rhs_index_type>{

[&](auto&& hasher, auto&& hash_table) {
auto const iter = cudf::detail::make_counting_transform_iterator(
0,
build_keys_fn<typename std::remove_reference<decltype(hasher)>::type, lhs_index_type>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
build_keys_fn<typename std::remove_reference<decltype(hasher)>::type, lhs_index_type>{
build_keys_fn<typename std::remove_reference_t<decltype(hasher)>, lhs_index_type>{

// used to circumvent conflicts between arrays of different types between
// different template instantiations due to the extern specifier.
extern __shared__ char raw_intermediate_storage[];
cudf::ast::detail::IntermediateDataType<has_nulls>* intermediate_storage =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be auto since you're reinterpret-casting it to the appropriate type?

cudf::size_type const right_num_rows = right_table.num_rows();
auto const outer_num_rows = left_num_rows;

cudf::size_type outer_row_index = threadIdx.x + blockIdx.x * block_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid raw threadIdx / blockIdx math. We have utility functions for abstracting this logic and ensuring safe types are used:

static constexpr thread_index_type global_thread_id(thread_index_type thread_id,

See examples:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only separated out the instantiation of the kernels in the MR, and copied the previous implementation. Unsure if I should address it in this MR, maybe a followup would be better?

#include <cudf/utilities/span.hpp>

#include <cub/cub.cuh>
#include "mixed_join_kernel_semi_impl.cuh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent on including the directory name. This file is next to mixed_join_common_utils.cuh. See the developer guide on includes for more info.

Suggested change
#include "mixed_join_kernel_semi_impl.cuh"
#include "join/mixed_join_kernel_semi_impl.cuh"

*/

#include "join/mixed_join_common_utils.cuh"
#include "mixed_join_kernel_semi_impl.cuh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "mixed_join_kernel_semi_impl.cuh"
#include "join/mixed_join_kernel_semi_impl.cuh"

@github-actions github-actions bot added Python Affects Python cuDF API. Java Affects Java cuDF API. cudf.polars Issues specific to cudf.polars labels Jul 26, 2024
@tgujar tgujar changed the base branch from branch-24.08 to branch-24.10 July 26, 2024 21:26
@wence- wence- removed Python Affects Python cuDF API. Java Affects Java cuDF API. cudf.polars Issues specific to cudf.polars labels Jul 29, 2024
@tgujar tgujar force-pushed the distinct-join-occupancy branch from 19944cf to 448b14b Compare August 1, 2024 19:33
@tgujar
Copy link
Contributor Author

tgujar commented Aug 1, 2024

Thanks for the review! Closing this PR, this work is merged into #15700. Please take a look there for further review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants