Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix uniform random key generation #630

Merged
merged 4 commits into from
Oct 31, 2024

Conversation

sleeepyjack
Copy link
Collaborator

Fixes #532

@sleeepyjack sleeepyjack added type: bug Something isn't working P0: Must have Critical feature or bug fix Needs Review Awaiting reviews before merging labels Oct 30, 2024
@sleeepyjack sleeepyjack added P1: Should have Necessary but not critical and removed P0: Must have Critical feature or bug fix labels Oct 30, 2024
@sleeepyjack
Copy link
Collaborator Author

sleeepyjack commented Oct 30, 2024

With this PR, benchmarks with uniform distribution and a multiplicity of 1 are slightly faster:

['dev.json', 'pr.json']
# static_set_insert_uniform_multiplicity

## [0] NVIDIA L40S

|  Key  |  Distribution  |  NumInputs  |  Occupancy  |  Multiplicity  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|----------------|-------------|-------------|----------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |    UNIFORM     |  100000000  |     0.5     |       1        |  44.668 ms |       0.09% |  38.967 ms |       0.18% | -5701.014 us | -12.76% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       2        |  32.507 ms |       0.14% |  32.398 ms |       0.07% |  -109.818 us |  -0.34% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       4        |  24.575 ms |       0.23% |  24.149 ms |       0.30% |  -426.423 us |  -1.74% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       8        |  20.208 ms |       0.37% |  19.801 ms |       0.12% |  -407.334 us |  -2.02% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       16       |  17.302 ms |       0.14% |  17.279 ms |       0.12% |   -22.946 us |  -0.13% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       1        |  47.367 ms |       0.09% |  41.614 ms |       0.16% | -5752.903 us | -12.15% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       2        |  34.789 ms |       0.21% |  34.482 ms |       0.24% |  -307.202 us |  -0.88% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       4        |  26.511 ms |       0.42% |  25.955 ms |       0.37% |  -555.892 us |  -2.10% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       8        |  21.807 ms |       0.47% |  21.381 ms |       0.54% |  -426.110 us |  -1.95% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       16       |  18.584 ms |       1.05% |  18.637 ms |       1.17% |    52.149 us |   0.28% |   PASS   |

@sleeepyjack
Copy link
Collaborator Author

Can confirm that this PR actually fixes the issues described in #532:

N=10
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 6 0 4 8 3 7 1 5 9 3 ]

N=10
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 1 0 1 0 1 1 0 1 0 1 ]

N=10
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 0 0 0 0 0 0 0 0 0 0 ]

N=100
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 14 56 97 39 80 22 63 5 46 88 ]

N=100
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 10 18 7 15 3 12 0 8 16 5 ]

N=100
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 6 0 4 8 2 7 1 5 9 3 ]

N=1000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 444 859 274 689 104 519 934 349 764 179 ]

N=1000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 131 14 97 180 64 147 30 113 196 79 ]

N=1000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 70 11 53 95 36 78 19 61 2 44 ]

N=10000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 5987 138 4289 8440 2591 6742 893 5044 9194 3345 ]

N=10000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 1583 413 1243 74 904 1734 564 1394 224 1055 ]

N=10000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 449 864 279 694 109 524 940 355 770 185 ]

N=100000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 68463 9972 51480 92989 34498 76006 17515 59024 532 42041 ]

N=100000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 6140 14442 2744 11045 19347 7649 15950 4252 12554 856 ]

N=100000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 7722 1873 6023 174 4325 8476 2627 6778 929 5079 ]

N=1000000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 86976 502062 917148 332235 747321 162408 577494 992581 407667 822753 ]

N=1000000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 122884 5902 88919 171936 54954 137971 20988 104006 187023 70040 ]

N=1000000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 10231 51740 93248 34757 76265 17774 59283 791 42300 83809 ]

auto num_unique_keys = cuda::std::max<size_t>(
1ull,
static_cast<size_t>(
cuda::std::ceil(static_cast<double>(num_) / static_cast<double>(dist_.value))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove

__host__ __device__ constexpr T int_div_ceil(T dividend, U divisor) noexcept

and use cuda::std::ceil instead

@sleeepyjack sleeepyjack merged commit 69817e2 into NVIDIA:dev Oct 31, 2024
18 checks passed
@sleeepyjack sleeepyjack deleted the fix/uniform-keygen branch October 31, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Awaiting reviews before merging P1: Should have Necessary but not critical type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Random key generator does not produce keys according to input parameters
2 participants