-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add new methods of generating bounded random integers #1979
Comments
Have you looked for raft/rng.hpp? Won't uniformInt (https://github.com/rapidsai/raft/blob/branch-22.02/cpp/include/raft/random/rng.hpp#L118) work for this? |
That method in RAFT is currently based on method 1 which is not ideal. On top of that, I'm really looking for a device API rather than host API. I've created a corresponding issue on RAFT: rapidsai/raft#406 |
This issue has been labeled |
This is a partial fix for #1979. Specifically, given `N = out-deg(v)` and a random number `r ∈ [0,1]`, one must obtain the equivalent discrete `index ∈ {0,1,...,N-1}`. Previous implementation used an upper bound `ubound = N-1` and a linear interpolation. As the issue above mentioned that approach creates problems near the (lower) boundary. The fix uses a better bound, namely `ubound = N` and the discrete transformation: `index = floor(r >= 1.0 ? N-1 : r*N)`. Attached Mathematica plots show the graphs for, say, `N = 13` and `N=17`. ![N=13_cropped](https://user-images.githubusercontent.com/37386037/159776745-13c72963-a426-46e2-975f-feedab6bbbb6.png) ![N=17_uniform_sampling](https://user-images.githubusercontent.com/37386037/159775015-203f4442-e2c7-4422-968e-e76807ec9639.png) This fix is not high priority for release 22-04, and can be included in the 22-06 release. Also, not all of the concerns formulated in the issue above are addressed by this PR. For example a uniform random generator callable from device is not yet available, but there are plans to perhaps expose something like that in `raft`. Authors: - Andrei Schaffer (https://github.com/aschaffer) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) URL: #2153
This issue has been labeled |
This has been solved with #434 and rapidsai/raft#513 |
Describe the solution you'd like
There have been recent developments in generating bounded random integers, in particular see swiftlang/swift#39143
This should be added to RAFT and then used in cugraph (in particular algorithms like random walks), for the reasons outlined in context below.
Additional context
There are generally two "simple" methods of generating bounded random integers:
]0, 1]
) and multiply by~bound
, then center this range and cast to integer.Method 1 is slow (due to modulo) and can have fairly high bias towards the lower values in
[0, bound[
since thebound
generally does not divide the number of possible integers generated.Method 2 seems better for lower values of
bound
but there are a number of issues:bound
, otherwise we don't get the right range, but curand may generate 1, sofloor(x * bound)
may generate the valuebound
. We either have to dofloor((x - std::numeric_limits<float>::epsilon()) * bound)
orfloor(x * bound)
then check forbound
and subtract 1 introducing slight bias towards the largest value. Multiplying bybound - 1
orbound - 0.5
always offsets the range with some values having more or less bias.bound
, we run into similar (if not bigger) problems with bias than the modulo-based method: with 32-bit float, there are only 2^24 possible values in]0, 1]
. That's only ~17M. For a large enough bound, we don't have enough floating point numbers to evenly distribute over the possible outputs: for example for a bound of 10M, some integers will get hit twice but some only once meaning that some integers will be 50% more likely to be drawn than others.bound
are small (as long as they are much smaller than 2^53). But generating uniform floating point numbers is slow itself and 64-bit floats are very slow, plus the arithmetic needed afterwards are slower, too.Examples of issues with floating-point centering:
[current method in random walks]
floor(x * (bound - 1))
: this has a very low likelihood to generate the valuebound - 1
. Forbound := 3
, this means2
is only generated with a probability of ~2^-24[centered method]:
floor(x * (bound - 1) + 0.5)
: Forbound := 3
, this would transform the range to ]0.5, 2.5] meaning that 1 is twice more likely to be generated than both 0 and 2.The text was updated successfully, but these errors were encountered: