Rng::gen_range() is slowest with power-of-2 input sizes? #1145

elidupree · 2021-07-15T15:50:51Z

I was just fiddling around with some RNG stuff when I noticed something that looks pretty odd:

These are time-per-call for calls to rng.gen_range(0..n) (you can see the value of n at the end of each name written on the vertical axis), sampled using Criterion. You'd think that power-of-2 range sizes would be the fastest to generate, since they can theoretically be done as just a single bitmask, with no rejections. But instead, they are the slowest!

I'm not deeply familiar with RNG algorithms, but I took a quick look through the code, and I wonder if this line from sample_single_inclusive is the culprit. Maybe it was intended to use (range-1).leading_zeros() rather than range.leading_zeros()? I haven't grokked what the line is doing, but that would be something that affects exactly power-of-2 sizes differently than their neighbors.

To test, this, I confirmed that Uniform::new(0, n).sample(&mut rng) does not have this pattern, and in fact, is faster than rng.gen_range(0..n) on my computer in almost every case I tested (but especially much faster on power-of-2 sizes):

Is it possible that the sample_single_inclusive code is out of date compared to optimizations that have been made more recently in sample_inclusive?

The text was updated successfully, but these errors were encountered:

vks · 2021-07-15T16:10:23Z

That's a keen observation, thanks! Definitely something we need to investigate. The results you are showing are from the benchmarks in the repository you linked? I would like to run them on my machine as well.

It almost looks like gen_range is never faster? Maybe we should consider replacing it with Uniform internally.

(About your experiments with generating only as many bits as necessary: This approach is indeed popular in some academic publications. We also have some discussion in #1014 and #1055. The problem is that it is tricky to make fast without compromising performance when generating u64, see #1014.)

elidupree · 2021-07-15T16:15:58Z

yeah, you should be able to just pull and run cargo bench. I've also just pushed an alternate branch with my code to run the benchmarks using Uniform.

Reading those issues now, I bet they'll be interesting!

vks · 2021-07-15T16:32:14Z

Thanks! The results on my machine are consistent with yours.

TheIronBorn · 2021-07-15T20:33:19Z

This is an artifact of the leading_zeros threshold approximation. I believe this should fix it:
(range << (range - 1).leading_zeros()).wrapping_sub(1)

It passes a 16-bit bias test:

(1..=u16::max_value()).into_par_iter().for_each(|range| {
        let zone = (range << ((range - 1).leading_zeros())).wrapping_sub(1);
        let mut map = vec![0; range as usize];
        for r in 0..=u16::max_value() {
            let m = r as u32 * range as u32;
            let h = (m >> 16) as u16;
            let l = m as u16;
            if l <= zone {
                map[h as usize] += 1;
            }
        }
        // must be uniform for every value
        assert!(map.iter().all(|&x| x == map[0]), "{:?}", range);
    });

TheIronBorn · 2021-07-15T20:58:40Z

Hang on, it makes others worse. Like range = 1

TheIronBorn · 2021-07-15T21:15:02Z

Note that the bitmask method has nearly overlapping rejection probabilities but is better on edge cases like this

dhardy · 2021-07-16T09:50:11Z

See also #951 and #985 by @jongiddy.

TheIronBorn · 2021-07-16T19:57:17Z

Note also that O'neill found that shortcutting the modulo-wide-mul method is actually fastest. See "All-Ranges Benchmark Results" for "Debiased Int Mult (t-opt/m-opt)" vs "Bitmask" (which should have similar speed to our method, perhaps even faster thanks to no multiplication)

dhardy · 2021-07-18T08:42:00Z

So, would @TheIronBorn like to make a PR?

TheIronBorn · 2021-07-18T22:48:14Z

Sure I’ll look into it

dhardy · 2023-02-03T14:36:53Z

I ran @elidupree's benchmark for the new RNGs evaluated in #1196. Summary: all Canon variants do not display poor behaviour except for (1 << 62) + 1.

Results are similar for other RNGs.

dhardy · 2023-02-03T17:21:57Z

That isn't to say that these algorithms handle all specially picked values well:

Values are:

(1 << 31) + 1 (good perf)
(1 << 55) + 1 (good)
1 << 62 (middle)
(1 << 62) + 1 (middle)
(1 << 63) - 1 (good)
1 << 63 (poor)
(1 << 63) + 1 (poor)

So special cases still lead to poor performance (no surprise), but we're at least a lot better than the "Old" (current) algorithm.

TheIronBorn mentioned this issue Jul 27, 2021

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

Closed

TheIronBorn mentioned this issue Dec 6, 2022

WIP: Investigate Canon's uniform-integer sampling method #1196

Closed

dhardy mentioned this issue Feb 17, 2023

Uniform sampling: use Canon's method #1287

Merged

dhardy closed this as completed Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

elidupree commented Jul 15, 2021 •

edited

Loading

vks commented Jul 15, 2021

elidupree commented Jul 15, 2021

vks commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021 •

edited

Loading

dhardy commented Jul 16, 2021 •

edited

Loading

TheIronBorn commented Jul 16, 2021

dhardy commented Jul 18, 2021

TheIronBorn commented Jul 18, 2021

dhardy commented Feb 3, 2023

dhardy commented Feb 3, 2023

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

Comments

elidupree commented Jul 15, 2021 • edited Loading

vks commented Jul 15, 2021

elidupree commented Jul 15, 2021

vks commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021

TheIronBorn commented Jul 15, 2021 • edited Loading

dhardy commented Jul 16, 2021 • edited Loading

TheIronBorn commented Jul 16, 2021

dhardy commented Jul 18, 2021

TheIronBorn commented Jul 18, 2021

dhardy commented Feb 3, 2023

dhardy commented Feb 3, 2023

elidupree commented Jul 15, 2021 •

edited

Loading

TheIronBorn commented Jul 15, 2021 •

edited

Loading

dhardy commented Jul 16, 2021 •

edited

Loading