[ENH] Add new methods of generating bounded random integers #1979

MatthiasKohl · 2021-12-03T17:42:14Z

Describe the solution you'd like
There have been recent developments in generating bounded random integers, in particular see swiftlang/swift#39143

This should be added to RAFT and then used in cugraph (in particular algorithms like random walks), for the reasons outlined in context below.

Additional context
There are generally two "simple" methods of generating bounded random integers:

Generate some number of random bits (usually 32 or 64), interpret as unsigned integer and take modulo bound.
Generate random uniform floating-point number (curand generates in interval ]0, 1]) and multiply by ~bound, then center this range and cast to integer.

Method 1 is slow (due to modulo) and can have fairly high bias towards the lower values in [0, bound[ since the bound generally does not divide the number of possible integers generated.
Method 2 seems better for lower values of bound but there are a number of issues:

It is actually not that easy to get right, since we need to center correctly: we have to multiply by bound, otherwise we don't get the right range, but curand may generate 1, so floor(x * bound) may generate the value bound. We either have to do floor((x - std::numeric_limits<float>::epsilon()) * bound) or floor(x * bound) then check for bound and subtract 1 introducing slight bias towards the largest value. Multiplying by bound - 1 or bound - 0.5 always offsets the range with some values having more or less bias.
For large values of bound, we run into similar (if not bigger) problems with bias than the modulo-based method: with 32-bit float, there are only 2^24 possible values in ]0, 1]. That's only ~17M. For a large enough bound, we don't have enough floating point numbers to evenly distribute over the possible outputs: for example for a bound of 10M, some integers will get hit twice but some only once meaning that some integers will be 50% more likely to be drawn than others.
We could solve the problems above by always generating 64-bit floats for which most possible values of bound are small (as long as they are much smaller than 2^53). But generating uniform floating point numbers is slow itself and 64-bit floats are very slow, plus the arithmetic needed afterwards are slower, too.

Examples of issues with floating-point centering:
[current method in random walks] floor(x * (bound - 1)) : this has a very low likelihood to generate the value bound - 1. For bound := 3, this means 2 is only generated with a probability of ~2^-24
[centered method]: floor(x * (bound - 1) + 0.5) : For bound := 3, this would transform the range to ]0.5, 2.5] meaning that 1 is twice more likely to be generated than both 0 and 2.

The text was updated successfully, but these errors were encountered:

seunghwak · 2021-12-03T19:10:10Z

Have you looked for raft/rng.hpp?

Won't uniformInt (https://github.com/rapidsai/raft/blob/branch-22.02/cpp/include/raft/random/rng.hpp#L118) work for this?

MatthiasKohl · 2021-12-03T20:12:34Z

That method in RAFT is currently based on method 1 which is not ideal. On top of that, I'm really looking for a device API rather than host API. I've created a corresponding issue on RAFT: rapidsai/raft#406

github-actions · 2022-01-02T21:04:56Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

This is a partial fix for #1979. Specifically, given `N = out-deg(v)` and a random number `r ∈ [0,1]`, one must obtain the equivalent discrete `index ∈ {0,1,...,N-1}`. Previous implementation used an upper bound `ubound = N-1` and a linear interpolation. As the issue above mentioned that approach creates problems near the (lower) boundary. The fix uses a better bound, namely `ubound = N` and the discrete transformation: `index = floor(r >= 1.0 ? N-1 : r*N)`. Attached Mathematica plots show the graphs for, say, `N = 13` and `N=17`. ![N=13_cropped](https://user-images.githubusercontent.com/37386037/159776745-13c72963-a426-46e2-975f-feedab6bbbb6.png) ![N=17_uniform_sampling](https://user-images.githubusercontent.com/37386037/159775015-203f4442-e2c7-4422-968e-e76807ec9639.png) This fix is not high priority for release 22-04, and can be included in the 22-06 release. Also, not all of the concerns formulated in the issue above are addressed by this PR. For example a uniform random generator callable from device is not yet available, but there are plans to perhaps expose something like that in `raft`. Authors: - Andrei Schaffer (https://github.com/aschaffer) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) URL: #2153

github-actions · 2022-04-02T21:05:08Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

MatthiasKohl · 2022-04-06T16:29:00Z

This has been solved with #434 and rapidsai/raft#513

MatthiasKohl added the ? - Needs Triage Need team to review and classify label Dec 3, 2021

MatthiasKohl mentioned this issue Dec 3, 2021

[FEA] RNG device interface rapidsai/raft#406

Closed

github-actions bot added the inactive-30d label Jan 2, 2022

aschaffer mentioned this issue Mar 23, 2022

Enhancement on uniform random sampling of indices near zero. #2153

Merged

github-actions bot added the inactive-90d label Apr 2, 2022

MatthiasKohl closed this as completed Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add new methods of generating bounded random integers #1979

[ENH] Add new methods of generating bounded random integers #1979

MatthiasKohl commented Dec 3, 2021

seunghwak commented Dec 3, 2021

MatthiasKohl commented Dec 3, 2021

github-actions bot commented Jan 2, 2022

github-actions bot commented Apr 2, 2022

MatthiasKohl commented Apr 6, 2022

[ENH] Add new methods of generating bounded random integers #1979

[ENH] Add new methods of generating bounded random integers #1979

Comments

MatthiasKohl commented Dec 3, 2021

seunghwak commented Dec 3, 2021

MatthiasKohl commented Dec 3, 2021

github-actions bot commented Jan 2, 2022

github-actions bot commented Apr 2, 2022

MatthiasKohl commented Apr 6, 2022