Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add new methods of generating bounded random integers #1979

Closed
MatthiasKohl opened this issue Dec 3, 2021 · 5 comments
Closed

[ENH] Add new methods of generating bounded random integers #1979

MatthiasKohl opened this issue Dec 3, 2021 · 5 comments
Labels
? - Needs Triage Need team to review and classify

Comments

@MatthiasKohl
Copy link
Contributor

Describe the solution you'd like
There have been recent developments in generating bounded random integers, in particular see swiftlang/swift#39143

This should be added to RAFT and then used in cugraph (in particular algorithms like random walks), for the reasons outlined in context below.

Additional context
There are generally two "simple" methods of generating bounded random integers:

  1. Generate some number of random bits (usually 32 or 64), interpret as unsigned integer and take modulo bound.
  2. Generate random uniform floating-point number (curand generates in interval ]0, 1]) and multiply by ~bound, then center this range and cast to integer.

Method 1 is slow (due to modulo) and can have fairly high bias towards the lower values in [0, bound[ since the bound generally does not divide the number of possible integers generated.
Method 2 seems better for lower values of bound but there are a number of issues:

  1. It is actually not that easy to get right, since we need to center correctly: we have to multiply by bound, otherwise we don't get the right range, but curand may generate 1, so floor(x * bound) may generate the value bound. We either have to do floor((x - std::numeric_limits<float>::epsilon()) * bound) or floor(x * bound) then check for bound and subtract 1 introducing slight bias towards the largest value. Multiplying by bound - 1 or bound - 0.5 always offsets the range with some values having more or less bias.
  2. For large values of bound, we run into similar (if not bigger) problems with bias than the modulo-based method: with 32-bit float, there are only 2^24 possible values in ]0, 1]. That's only ~17M. For a large enough bound, we don't have enough floating point numbers to evenly distribute over the possible outputs: for example for a bound of 10M, some integers will get hit twice but some only once meaning that some integers will be 50% more likely to be drawn than others.
  3. We could solve the problems above by always generating 64-bit floats for which most possible values of bound are small (as long as they are much smaller than 2^53). But generating uniform floating point numbers is slow itself and 64-bit floats are very slow, plus the arithmetic needed afterwards are slower, too.

Examples of issues with floating-point centering:
[current method in random walks] floor(x * (bound - 1)) : this has a very low likelihood to generate the value bound - 1. For bound := 3, this means 2 is only generated with a probability of ~2^-24
[centered method]: floor(x * (bound - 1) + 0.5) : For bound := 3, this would transform the range to ]0.5, 2.5] meaning that 1 is twice more likely to be generated than both 0 and 2.

@MatthiasKohl MatthiasKohl added the ? - Needs Triage Need team to review and classify label Dec 3, 2021
@seunghwak
Copy link
Contributor

Have you looked for raft/rng.hpp?

Won't uniformInt (https://github.com/rapidsai/raft/blob/branch-22.02/cpp/include/raft/random/rng.hpp#L118) work for this?

@MatthiasKohl
Copy link
Contributor Author

That method in RAFT is currently based on method 1 which is not ideal. On top of that, I'm really looking for a device API rather than host API. I've created a corresponding issue on RAFT: rapidsai/raft#406

@github-actions
Copy link

github-actions bot commented Jan 2, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Mar 24, 2022
This is a partial fix for #1979. 

Specifically, given `N = out-deg(v)` and a random number `r ∈ [0,1]`, one must obtain the equivalent discrete 
`index ∈ {0,1,...,N-1}`. Previous implementation used an upper bound `ubound = N-1` and a linear interpolation. As the issue above mentioned that approach creates problems near the (lower) boundary. 

The fix uses a better bound, namely `ubound = N` and the discrete transformation: `index = floor(r >= 1.0 ? N-1 : r*N)`.

Attached Mathematica plots show the graphs for, say, `N = 13` and `N=17`.

![N=13_cropped](https://user-images.githubusercontent.com/37386037/159776745-13c72963-a426-46e2-975f-feedab6bbbb6.png)

![N=17_uniform_sampling](https://user-images.githubusercontent.com/37386037/159775015-203f4442-e2c7-4422-968e-e76807ec9639.png)

This fix is not high priority for release 22-04, and can be included in the 22-06 release. Also, not all of the concerns formulated in the issue above are addressed by this PR. For example a uniform random generator callable from device is not yet available, but there are plans to perhaps expose something like that in `raft`.

Authors:
  - Andrei Schaffer (https://github.com/aschaffer)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #2153
@github-actions
Copy link

github-actions bot commented Apr 2, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@MatthiasKohl
Copy link
Contributor Author

This has been solved with #434 and rapidsai/raft#513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify
Projects
None yet
Development

No branches or pull requests

2 participants