Tracking Issue for `{u8,i8,...}::isqrt` #116226

FedericoStra · 2023-09-28T10:29:47Z

Feature gate: #![feature(isqrt)]

This is a tracking issue for the functions {u8,u16,u32,u64,i128,usize}::isqrt and {i8,i16,i32,i64,i128,isize}::{isqrt,checked_isqrt}, which compute the integer square root, addressing issue #89273.

Public API

For every suffix N among 8, 16, 32, 64, 128 and size, the feature isqrt introduces the methods

const fn uN::isqrt() -> uN;
const fn iN::isqrt() -> iN;
const fn iN::checked_isqrt() -> Option<iN>;

Expand to see the full API

const fn u8::isqrt() -> u8;
const fn i8::isqrt() -> i8;
const fn i8::checked_isqrt() -> Option<i8>;


const fn u16::isqrt() -> u16;
const fn i16::isqrt() -> i16;
const fn i16::checked_isqrt() -> Option<i16>;


const fn u32::isqrt() -> u32;
const fn i32::isqrt() -> i32;
const fn i32::checked_isqrt() -> Option<i32>;


const fn u64::isqrt() -> u64;
const fn i64::isqrt() -> i64;
const fn i64::checked_isqrt() -> Option<i64>;


const fn u128::isqrt() -> u128;
const fn i128::isqrt() -> i128;
const fn i128::checked_isqrt() -> Option<i128>;


const fn usize::isqrt() -> usize;
const fn isize::isqrt() -> isize;
const fn isize::checked_isqrt() -> Option<isize>;

Steps / History

Forum discussion history:
- https://users.rust-lang.org/t/u32-integer-square-root/86909/2?u=scottmcm
- https://users.rust-lang.org/t/integer-square-root-algorithm/13529/5?u=scottmcm
Implementation: Add "integer square root" method to integer primitive types #116176
Final comment period (FCP)¹
Stabilization PR

Unresolved Questions

None yet.

https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩

The text was updated successfully, but these errors were encountered:

ChaiTRex · 2023-09-29T02:38:30Z

Once #116176 is merged, I would like to submit improved tests and compiler optimization hints for signed integers with tighter upper bounds than with the corresponding unsigned integers.

Separately, I'm going to try to use f32 and f64 to speed up isqrt, as I think that, up to 64-bit integers, the square root function on those might be faster and any errors in the outputs can be adjusted for. This would have the downsides of making the functions non-const and multiplying the written implementations, as doing this effectively would require different code for different bit sizes (for example, u32 could use f64's 53-bit mantissa with no errors, while u64 would need to use f64 with an error adjustment).

ChaiTRex · 2023-09-30T21:44:59Z

In a benchmarking repository, I've implemented some methods based on floating-point instructions. I get the following speed improvements with AMD Ryzen 9 5900X:

Speed of original and floating-point methods

original_i8             time:   [10.921 ns 10.963 ns 11.010 ns]
floating_i8             time:   [9.7668 ns 9.8591 ns 9.9453 ns]

original_u8             time:   [9.5130 ns 9.5649 ns 9.6401 ns]
floating_u8             time:   [5.3333 ns 5.3488 ns 5.3632 ns]

original_i16            time:   [12.231 ns 12.257 ns 12.289 ns]
floating_i16            time:   [8.5368 ns 8.5592 ns 8.5828 ns]

original_u16            time:   [12.858 ns 12.875 ns 12.896 ns]
floating_u16            time:   [5.3525 ns 5.3536 ns 5.3549 ns]

original_i32            time:   [11.941 ns 11.945 ns 11.950 ns]
floating_i32            time:   [7.5067 ns 7.5106 ns 7.5152 ns]

original_u32            time:   [15.381 ns 15.420 ns 15.478 ns]
floating_u32            time:   [3.8586 ns 3.9171 ns 3.9841 ns]

original_i64            time:   [19.061 ns 19.064 ns 19.068 ns]
floating_i64            time:   [7.2079 ns 7.2427 ns 7.2740 ns]

original_u64            time:   [28.456 ns 28.502 ns 28.549 ns]
floating_u64            time:   [5.8539 ns 5.8567 ns 5.8601 ns]

original_i128           time:   [130.68 ns 130.93 ns 131.15 ns]
floating_i128           time:   [14.931 ns 14.996 ns 15.134 ns]

original_u128           time:   [248.76 ns 248.81 ns 248.86 ns]
floating_u128           time:   [17.346 ns 17.380 ns 17.410 ns]

Edit: Rebenchmarked because I used the Karatsuba square root algorithm for 128-bit integers to get a major speedup.

ryanavella · 2024-01-17T04:41:15Z

@ChaiTRex I noticed in your benchmarking repository that you've commented out (what I presume was previously) a lookup table implementation for u8 and i8, how did that perform?

I benchmarked a few naive lookup tables, and at least on my machine (ThinkPad T490, i5-8365U) the results look comparable, arguably even competitive.

Benchmarks

floating+karatsuba_i8 time: [10.002 ns 10.008 ns 10.015 ns]
floating+karatsuba_u8 time: [4.8382 ns 4.8987 ns 4.9682 ns]
floating_i8           time: [11.109 ns 11.162 ns 11.207 ns]
floating_u8           time: [6.2175 ns 6.2430 ns 6.2621 ns]
karatsuba_i8         time: [9.8523 ns 9.9603 ns 10.052 ns]
karatsuba_u8         time: [5.0738 ns 5.1621 ns 5.2458 ns]
original_i8          time: [13.251 ns 13.406 ns 13.539 ns]
original_u8          time: [10.680 ns 10.696 ns 10.709 ns]

lut16_i8             time: [11.032 ns 11.033 ns 11.034 ns]
lut16_u8             time: [7.9575 ns 7.9678 ns 7.9778 ns]
lut32_i8             time: [11.323 ns 11.328 ns 11.334 ns]
lut32_u8             time: [6.7746 ns 6.8290 ns 6.8980 ns]
lut64_i8             time: [10.113 ns 10.150 ns 10.214 ns]
lut64_u8             time: [5.8406 ns 5.8711 ns 5.9037 ns]
lut256_i8            time: [10.007 ns 10.026 ns 10.051 ns]
lut256_u8            time: [4.9602 ns 4.9606 ns 4.9611 ns]

Here are my unsigned implementations for what I'm currently calling lut16, lut32, lut64, and lut256, which offer different tradeoffs between table size and branchiness:

(note: for the signed benchmarks I delegated to the unsigned implementation, on the assumption that we don't want to bloat up binaries with redundant lookup tables)

lut16

impl UnsignedIsqrt for u8 {
    fn isqrt(self) -> Self {
        const LUT: [u8; 16] = [
            3, 5, 6, 7,
            8, 9, 10, 11,
            11, 12, 13, 13,
            14, 14, 15, 15,
        ];
        let mut est = LUT[(self >> 4) as usize];
        let mut square = est * est;
        if square > self {
            square -= 2*est - 1;
            est -= 1;
            if square > self {
                square -= 2*est - 1;
                est -= 1;
                if square > self {
                    // square -= 2*est - 1;
                    est -= 1;
                }
            }
        }
        est
    }
}

lut32

impl UnsignedIsqrt for u8 {
    fn isqrt(self) -> Self {
        const LUT: [u8; 32] = [
            2, 3, 4, 5, 6, 6, 7, 7,
            8, 8, 9, 9, 10, 10, 10, 11,
            11, 11, 12, 12, 12, 13, 13, 13,
            14, 14, 14, 14, 15, 15, 15, 15,
        ];
        let mut est = LUT[(self >> 3) as usize];
        let mut square = est * est;
        if square > self {
            square -= 2*est - 1;
            est -= 1;
            if square > self {
                // square -= 2*est - 1;
                est -= 1;
            }
        }
        est
    }
}

lut64

impl UnsignedIsqrt for u8 {
    fn isqrt(self) -> Self {
        const LUT: [u8; 64] = [
            1, 2, 3, 3, 4, 4, 5, 5,
            5, 6, 6, 6, 7, 7, 7, 7,
            8, 8, 8, 8, 9, 9, 9, 9,
            9, 10, 10, 10, 10, 10, 11, 11,
            11, 11, 11, 11, 12, 12, 12, 12,
            12, 12, 13, 13, 13, 13, 13, 13,
            13, 14, 14, 14, 14, 14, 14, 14,
            15, 15, 15, 15, 15, 15, 15, 15,
        ];
        let est = LUT[(self >> 2) as usize];
        if est * est > self {
            est - 1
        } else {
            est
        }
    }
}

lut256

impl UnsignedIsqrt for u8 {
    fn isqrt(self) -> Self {
        const LUT: [u8; 256] = [
            0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
            4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5,
            5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
            6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
            8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
            8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
            9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
            10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 
            11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
            12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 
            12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 
            13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
            13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 
            14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 
            14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 
            15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 
        ];
        LUT[self as usize]
    }
}

ChaiTRex · 2024-01-25T00:23:03Z

@ryanavella

@ChaiTRex I noticed in your benchmarking repository that you've commented out (what I presume was previously) a lookup table implementation for u8 and i8, how did that perform?

I didn't include either the table or libgmp versions because of copyright concerns.

table uses fred_sqrt from part 3c of Paul Hsieh's Square Root page (note that the 256 u8 table there contains fixed-point square roots where the integer part is the most significant four bits). This table is then used for u32's isqrt with fred_sqrt edited by me to use its repetitiveness to significantly reduce the number of branches.

Benchmarks, including commented out implementations

original_i8             time:   [8.8528 ns 8.8812 ns 8.9134 ns]
floating_i8             time:   [7.1974 ns 7.2120 ns 7.2323 ns]
floating+karatsuba_i8   time:   [6.6171 ns 6.6359 ns 6.6577 ns]
karatsuba_i8            time:   [6.8114 ns 6.8362 ns 6.8623 ns]
table_i8                time:   [6.6069 ns 6.6273 ns 6.6469 ns]
libgmp_i8               time:   [6.7207 ns 6.7242 ns 6.7282 ns]

original_u8             time:   [7.3317 ns 7.3425 ns 7.3522 ns]
floating_u8             time:   [4.6880 ns 4.7037 ns 4.7192 ns]
floating+karatsuba_u8   time:   [2.5223 ns 2.5303 ns 2.5377 ns]
karatsuba_u8            time:   [3.1860 ns 3.1993 ns 3.2164 ns]
table_u8                time:   [2.5434 ns 2.5499 ns 2.5565 ns]
libgmp_u8               time:   [2.5312 ns 2.5382 ns 2.5450 ns]

original_i16            time:   [9.9607 ns 9.9781 ns 9.9970 ns]
floating_i16            time:   [6.8501 ns 6.8723 ns 6.8935 ns]
floating+karatsuba_i16  time:   [6.8666 ns 6.8890 ns 6.9088 ns]
karatsuba_i16           time:   [7.5343 ns 7.5497 ns 7.5659 ns]
table_i16               time:   [7.3142 ns 7.3333 ns 7.3528 ns]
libgmp_i16              time:   [7.4119 ns 7.4263 ns 7.4410 ns]

original_u16            time:   [10.318 ns 10.332 ns 10.349 ns]
floating_u16            time:   [6.5575 ns 6.5711 ns 6.5827 ns]
floating+karatsuba_u16  time:   [3.2735 ns 3.2822 ns 3.2902 ns]
karatsuba_u16           time:   [4.6550 ns 4.6614 ns 4.6684 ns]
table_u16               time:   [3.7317 ns 3.7352 ns 3.7390 ns]
libgmp_u16              time:   [3.6700 ns 3.6795 ns 3.6894 ns]

original_i32            time:   [11.113 ns 11.138 ns 11.160 ns]
floating_i32            time:   [6.6519 ns 6.6616 ns 6.6697 ns]
floating+karatsuba_i32  time:   [6.5736 ns 6.5880 ns 6.6057 ns]
karatsuba_i32           time:   [8.2693 ns 8.2823 ns 8.3002 ns]
table_i32               time:   [7.2038 ns 7.2214 ns 7.2417 ns]
libgmp_i32              time:   [7.2301 ns 7.2389 ns 7.2476 ns]

original_u32            time:   [13.647 ns 13.680 ns 13.719 ns]
floating_u32            time:   [2.8695 ns 2.8776 ns 2.8846 ns]
floating+karatsuba_u32  time:   [2.7922 ns 2.8011 ns 2.8093 ns]
karatsuba_u32           time:   [7.5492 ns 7.6069 ns 7.6858 ns]
table_u32               time:   [4.7263 ns 4.7436 ns 4.7614 ns]
libgmp_u32              time:   [4.4509 ns 4.4625 ns 4.4740 ns]

original_i64            time:   [19.217 ns 19.287 ns 19.371 ns]
floating_i64            time:   [9.1175 ns 9.1324 ns 9.1488 ns]
floating+karatsuba_i64  time:   [9.3864 ns 9.4072 ns 9.4309 ns]
karatsuba_i64           time:   [13.713 ns 13.737 ns 13.770 ns]
table_i64               time:   [11.280 ns 11.285 ns 11.290 ns]
libgmp_i64              time:   [11.359 ns 11.420 ns 11.488 ns]

original_u64            time:   [28.746 ns 28.808 ns 28.886 ns]
floating_u64            time:   [5.7362 ns 5.7439 ns 5.7525 ns]
floating+karatsuba_u64  time:   [5.8237 ns 5.8340 ns 5.8452 ns]
karatsuba_u64           time:   [18.599 ns 18.628 ns 18.674 ns]
table_u64               time:   [12.631 ns 12.656 ns 12.686 ns]
libgmp_u64              time:   [12.943 ns 12.947 ns 12.951 ns]

original_i128           time:   [129.46 ns 129.99 ns 130.48 ns]
floating_i128           time:   [16.033 ns 16.112 ns 16.224 ns]
floating+karatsuba_i128 time:   [15.891 ns 15.918 ns 15.948 ns]
karatsuba_i128          time:   [21.446 ns 21.527 ns 21.597 ns]
table_i128              time:   [17.982 ns 18.018 ns 18.052 ns]
libgmp_i128             time:   [17.814 ns 17.832 ns 17.851 ns]

original_u128           time:   [250.32 ns 251.21 ns 252.13 ns]
floating_u128           time:   [20.254 ns 20.288 ns 20.333 ns]
floating+karatsuba_u128 time:   [19.654 ns 19.725 ns 19.804 ns]
karatsuba_u128          time:   [30.799 ns 30.853 ns 30.915 ns]
table_u128              time:   [23.780 ns 23.824 ns 23.873 ns]
libgmp_u128             time:   [23.687 ns 23.777 ns 23.864 ns]

It should be noted that libgmp was somewhat faster than table last time I checked, but now they're essentially equivalent.

Here are my unsigned implementations for what I'm currently calling lut16, lut32, lut64, and lut256, which offer different tradeoffs between table size and branchiness:

Since the concern is reducing the amount of memory used, one other concern is code size:

Implementation	Table size	Code size	Total size
`lut16`	16	0x4a = 74	90
`lut32`	32	0x3d = 61	93
`lut64`	64	0x20 = 32	96
`lut256`	256	0x10 = 16	272

Since each value in your tables takes up four bits, two of them can fit in a u8, so maybe that can help as well.

It should be noted that lut256 can probably be inlined into one memory access with a small computation of the address.

tfpf · 2024-02-04T06:18:22Z

Is there an estimate on when this might become available in stable?

ryanavella · 2024-02-04T07:41:54Z

Is there an estimate on when this might become available in stable?

I'm not sure if these are blocking stabilization necessarily, but these are some outstanding questions that I'd like to see answered:

Whether we even want to provide a panicking API for signed integers (it is kind of redundant with x.checked_sqrt().unwrap())
Whether to use floating point acceleration (and if so, a proof that it yields the same results regardless of FP optimizations and FPU quirks)
How to detect hardware floating point support (Needed: cfg(target_feature) support for hardware floating point detection. #64514)
What kind of tradeoff is acceptable between code size, table size (if applicable), and speed

Also, we could always use more testers and more benchmarks.

tfpf · 2024-02-16T19:07:20Z

My two cents, as a relatively new user (started using Rust just a year ago), who is absolutely not an expert in compiler/library design.

x.ilog2(), which has been stabilised, panics if x <= 0. (I assume that's not the same as 'having a panicking API'?) Whatever is true of it should also be true of x.isqrt() when x < 0.
From a cursory glance at musl (used on lightweight OSes like Alpine), it appears to not use lookup tables. Perhaps the same could be done here?
- A 64-bit system could be assumed to be not terribly memory-constrained, so full use of lookup tables might be considered.

ryanavella · 2024-02-16T20:08:52Z

From a cursory glance at musl (used on lightweight OSes like Alpine), it appears to not use lookup tables. Perhaps the same could be done here?

Maybe I'm reading the musl sources wrong, but I don't see an implementation for integer square root? I did find this lookup table used for (floating-point) sqrt. Keep in mind that this implementation is probably only relevant to platforms without hardware floating point.

tfpf · 2024-02-16T20:36:07Z

Nice find; I didn’t check thoroughly. Yes, they don’t have an integer square root function, but I was trying to see whether they used any lookup tables at all, since musl is a relatively new library (compared to how old C is) comparable in age to Rust. (Probably not fair to compare a library and a language, though.)

ChaiTRex · 2024-02-18T03:52:09Z

@ryanavella With regard to your questions:

Whether we even want to provide a panicking API for signed integers (it is kind of redundant with x.checked_sqrt().unwrap())

I agree with what @tfpf said about maintaining consistency with the behavior of ilog functions.

Whether to use floating point acceleration (and if so, a proof that it yields the same results regardless of FP optimizations and FPU quirks)

I think that, for a first release, it's not necessary to have floating point support, as a decently fast, non-FP (because some targets have no FP at all or slow FP square root) method for u8 and the Karatsuba method for everything else will be pretty fast.

As far as a proof, a rough one is available in the 34th comment of this linked discussion.

If it matters, I also looked into the rounding mode used and found that Rust uses and requires the default rounding mode and that the default rounding mode for IEEE-754 is "round-to-nearest-ties-to-even".

What kind of tradeoff is acceptable between code size, table size (if applicable), and speed

It would be nice if the s and z optimization settings were available as a cfg flag so that, just by the developer choosing in the standard way whether to optimize for speed or size, the isqrt implementations would then be able to use that to choose between the smallest code+table size or the fastest available implementation.

That and FP-detection may make thoroughly testing it difficult, however.

Maybe I'm reading the musl sources wrong, but I don't see an implementation for integer square root? I did find this lookup table used for (floating-point) sqrt.

I'd caution that licensing issues should be considered before using or starting from and modifying musl's code so that the two licenses for the Rust standard library (including who's given credit for the code) are all that's needed to use the code.

ChaiTRex · 2024-02-18T04:03:19Z

In my last comment in this thread, I pointed out a discussion I had started on internals.rust-lang.org.

I also started a discussion on the Rust Zulip, where I was informed that core::intrinsics::const_eval_select can be used to take two implementations of isqrt, one const fn and a faster one that can't be const fn (like here with the floating point operations) and meld them into a combined const fn that uses the fast version except in a const context.

The const fn version could use a large table to speed itself up without affecting the executable, as that version is only used at compile time.

tgross35 · 2024-05-23T01:30:03Z

Using floating point operations to speed up integer operations seems very out of scope for std, especially starting here - we don't do this on any other methods, possible loss of precision isn't nice, and dealing with different behavior on softfloat targets also isn't ideal. Methods that accept lossy calculations in exchange for performance are a cool thing to have in the ecosystem though - have you considered publishing a faster isqrt (and others that would benefit) as a crate? (the name fast-int isn't taken yet :) ).

These haven't yet been around a year but seem uncontroversial. Putting up a stabilization PR is easy, anyone interested could probably do that to get the discussion ball rolling.

ChaiTRex · 2024-05-23T16:26:16Z

possible loss of precision isn't nice

The floating point methods give precise results. This can be seen by checking the square root of all the perfect squares ((n * n).isqrt() == n) and all the perfect squares minus one (((n * n) - 1).isqrt() == n - 1) as well as checking 0 and MAX, which has been done exhaustively for u8 through u64. u128 uses the correct results of u64 along with the Karatsuba square root algorithm, which uses integer operations only.

A proof that it works can also be found in a comment in the internals forum.

I do think that this should be stabilized with the Karatsuba methods with no floating point, as it appears that there is a lot of work involved in getting the floating point stuff working.

tgross35 · 2024-05-23T16:43:13Z

u128 uses the correct results of u64 along with the Karatsuba square root algorithm, which uses integer operations only.

This is specifically the case I was referring to, but I guess it is covered if there is a fallback.

You could make a PR to change to Karatsubat at any time if that shows improvement, changing the internal implementation is independent of stabilization.

kennytm · 2024-06-03T03:51:34Z

Could we have NonZero<uN>::isqrt() -> NonZero<uN> too 🤔

Add `isqrt` to `NonZero<uN>` Implements [rust-lang#70887 (comment)](rust-lang#116226 (comment)), with the following signature: ```rust impl NonZero<uN> { const fn isqrt(self) -> Self; } ``` Unintended benefits include one fewer panicking branch in `ilog2` for LLVM to optimize away, and one fewer `assume_unchecked` as `NonZero` already does that. The fast path for `self == 1` is dropped, but the current implementation is very slow anyways compared to hardware. Performance improvements can always come later. (I didn't add the function to `NonZero<iN>`, since _every_ existing `NonZero` method is non-panicking, and it might be nice to leave it that way.)

Rollup merge of rust-lang#126199 - ivan-shrimp:nonzero_isqrt, r=tgross35 Add `isqrt` to `NonZero<uN>` Implements [rust-lang#70887 (comment)](rust-lang#116226 (comment)), with the following signature: ```rust impl NonZero<uN> { const fn isqrt(self) -> Self; } ``` Unintended benefits include one fewer panicking branch in `ilog2` for LLVM to optimize away, and one fewer `assume_unchecked` as `NonZero` already does that. The fast path for `self == 1` is dropped, but the current implementation is very slow anyways compared to hardware. Performance improvements can always come later. (I didn't add the function to `NonZero<iN>`, since _every_ existing `NonZero` method is non-panicking, and it might be nice to leave it that way.)

Improved `checked_isqrt` and `isqrt` methods ### Improved tests of `isqrt` and `checked_isqrt` implementations * Inputs chosen more thoroughly and systematically. * Checks that `isqrt` and `checked_isqrt` have equivalent results for signed types, either equivalent numerically or equivalent as a panic and a `None`. * Checks that `isqrt` has numerically-equivalent results for unsigned types and their `NonZero` counterparts. ### Added benchmarks for `isqrt` implementations ### Greatly sped up `checked_isqrt` and `isqrt` methods * Uses a lookup table for 8-bit integers and then the Karatsuba square root algorithm for larger integers. * Includes optimization hints that give the compiler the exact numeric range of results. ### Feature tracking issue `isqrt` is an unstable feature tracked at rust-lang#116226. <details><summary>Benchmarked improvements</summary> ### Command used to benchmark ./x bench library/core -- int_sqrt ### Before benchmarks: num::int_sqrt::i128::isqrt 439591.65/iter +/- 6652.70 num::int_sqrt::i16::isqrt 5302.97/iter +/- 160.93 num::int_sqrt::i32::isqrt 62999.11/iter +/- 2022.05 num::int_sqrt::i64::isqrt 125248.81/iter +/- 1674.43 num::int_sqrt::i8::isqrt 123.56/iter +/- 1.87 num::int_sqrt::isize::isqrt 125356.56/iter +/- 1017.03 num::int_sqrt::non_zero_u128::isqrt 437443.75/iter +/- 3535.43 num::int_sqrt::non_zero_u16::isqrt 8604.58/iter +/- 94.76 num::int_sqrt::non_zero_u32::isqrt 62933.33/iter +/- 517.30 num::int_sqrt::non_zero_u64::isqrt 125076.38/iter +/- 11340.61 num::int_sqrt::non_zero_u8::isqrt 221.51/iter +/- 1.58 num::int_sqrt::non_zero_usize::isqrt 136005.21/iter +/- 2020.35 num::int_sqrt::u128::isqrt 439014.55/iter +/- 3920.45 num::int_sqrt::u16::isqrt 8575.08/iter +/- 148.06 num::int_sqrt::u32::isqrt 63008.89/iter +/- 803.67 num::int_sqrt::u64::isqrt 125088.09/iter +/- 879.29 num::int_sqrt::u8::isqrt 230.18/iter +/- 2.04 num::int_sqrt::usize::isqrt 125237.51/iter +/- 4747.83 ### After benchmarks: num::int_sqrt::i128::isqrt 105184.89/iter +/- 1171.38 num::int_sqrt::i16::isqrt 1910.26/iter +/- 78.50 num::int_sqrt::i32::isqrt 34260.34/iter +/- 960.84 num::int_sqrt::i64::isqrt 45939.19/iter +/- 2525.65 num::int_sqrt::i8::isqrt 22.87/iter +/- 0.45 num::int_sqrt::isize::isqrt 45884.17/iter +/- 595.49 num::int_sqrt::non_zero_u128::isqrt 106344.27/iter +/- 780.99 num::int_sqrt::non_zero_u16::isqrt 2790.19/iter +/- 53.43 num::int_sqrt::non_zero_u32::isqrt 33613.99/iter +/- 362.96 num::int_sqrt::non_zero_u64::isqrt 46235.42/iter +/- 429.69 num::int_sqrt::non_zero_u8::isqrt 31.78/iter +/- 0.75 num::int_sqrt::non_zero_usize::isqrt 46208.75/iter +/- 375.27 num::int_sqrt::u128::isqrt 106385.94/iter +/- 1649.95 num::int_sqrt::u16::isqrt 2747.69/iter +/- 28.72 num::int_sqrt::u32::isqrt 33627.09/iter +/- 475.68 num::int_sqrt::u64::isqrt 46182.29/iter +/- 311.16 num::int_sqrt::u8::isqrt 33.10/iter +/- 0.30 num::int_sqrt::usize::isqrt 46165.00/iter +/- 388.41 </details>

Improved `checked_isqrt` and `isqrt` methods ### Improved tests of `isqrt` and `checked_isqrt` implementations * Inputs chosen more thoroughly and systematically. * Checks that `isqrt` and `checked_isqrt` have equivalent results for signed types, either equivalent numerically or equivalent as a panic and a `None`. * Checks that `isqrt` has numerically-equivalent results for unsigned types and their `NonZero` counterparts. ### Added benchmarks for `isqrt` implementations ### Greatly sped up `checked_isqrt` and `isqrt` methods * Uses a lookup table for 8-bit integers and then the Karatsuba square root algorithm for larger integers. * Includes optimization hints that give the compiler the exact numeric range of results. ### Feature tracking issue `isqrt` is an unstable feature tracked at rust-lang#116226. <details><summary>Benchmarked improvements</summary> ### Command used to benchmark ./x bench library/core -- int_sqrt ### Before benchmarks: num::int_sqrt::i128::isqrt 439591.65/iter +/- 6652.70 num::int_sqrt::i16::isqrt 5302.97/iter +/- 160.93 num::int_sqrt::i32::isqrt 62999.11/iter +/- 2022.05 num::int_sqrt::i64::isqrt 125248.81/iter +/- 1674.43 num::int_sqrt::i8::isqrt 123.56/iter +/- 1.87 num::int_sqrt::isize::isqrt 125356.56/iter +/- 1017.03 num::int_sqrt::non_zero_u128::isqrt 437443.75/iter +/- 3535.43 num::int_sqrt::non_zero_u16::isqrt 8604.58/iter +/- 94.76 num::int_sqrt::non_zero_u32::isqrt 62933.33/iter +/- 517.30 num::int_sqrt::non_zero_u64::isqrt 125076.38/iter +/- 11340.61 num::int_sqrt::non_zero_u8::isqrt 221.51/iter +/- 1.58 num::int_sqrt::non_zero_usize::isqrt 136005.21/iter +/- 2020.35 num::int_sqrt::u128::isqrt 439014.55/iter +/- 3920.45 num::int_sqrt::u16::isqrt 8575.08/iter +/- 148.06 num::int_sqrt::u32::isqrt 63008.89/iter +/- 803.67 num::int_sqrt::u64::isqrt 125088.09/iter +/- 879.29 num::int_sqrt::u8::isqrt 230.18/iter +/- 2.04 num::int_sqrt::usize::isqrt 125237.51/iter +/- 4747.83 ### After benchmarks: num::int_sqrt::i128::isqrt 105184.89/iter +/- 1171.38 num::int_sqrt::i16::isqrt 1910.26/iter +/- 78.50 num::int_sqrt::i32::isqrt 34260.34/iter +/- 960.84 num::int_sqrt::i64::isqrt 45939.19/iter +/- 2525.65 num::int_sqrt::i8::isqrt 22.87/iter +/- 0.45 num::int_sqrt::isize::isqrt 45884.17/iter +/- 595.49 num::int_sqrt::non_zero_u128::isqrt 106344.27/iter +/- 780.99 num::int_sqrt::non_zero_u16::isqrt 2790.19/iter +/- 53.43 num::int_sqrt::non_zero_u32::isqrt 33613.99/iter +/- 362.96 num::int_sqrt::non_zero_u64::isqrt 46235.42/iter +/- 429.69 num::int_sqrt::non_zero_u8::isqrt 31.78/iter +/- 0.75 num::int_sqrt::non_zero_usize::isqrt 46208.75/iter +/- 375.27 num::int_sqrt::u128::isqrt 106385.94/iter +/- 1649.95 num::int_sqrt::u16::isqrt 2747.69/iter +/- 28.72 num::int_sqrt::u32::isqrt 33627.09/iter +/- 475.68 num::int_sqrt::u64::isqrt 46182.29/iter +/- 311.16 num::int_sqrt::u8::isqrt 33.10/iter +/- 0.30 num::int_sqrt::usize::isqrt 46165.00/iter +/- 388.41 </details> Tracking Issue for {u8,i8,...}::isqrt rust-lang#116226 try-job: test-various

Rollup merge of rust-lang#128166 - ChaiTRex:isqrt, r=tgross35 Improved `checked_isqrt` and `isqrt` methods ### Improved tests of `isqrt` and `checked_isqrt` implementations * Inputs chosen more thoroughly and systematically. * Checks that `isqrt` and `checked_isqrt` have equivalent results for signed types, either equivalent numerically or equivalent as a panic and a `None`. * Checks that `isqrt` has numerically-equivalent results for unsigned types and their `NonZero` counterparts. ### Added benchmarks for `isqrt` implementations ### Greatly sped up `checked_isqrt` and `isqrt` methods * Uses a lookup table for 8-bit integers and then the Karatsuba square root algorithm for larger integers. * Includes optimization hints that give the compiler the exact numeric range of results. ### Feature tracking issue `isqrt` is an unstable feature tracked at rust-lang#116226. <details><summary>Benchmarked improvements</summary> ### Command used to benchmark ./x bench library/core -- int_sqrt ### Before benchmarks: num::int_sqrt::i128::isqrt 439591.65/iter +/- 6652.70 num::int_sqrt::i16::isqrt 5302.97/iter +/- 160.93 num::int_sqrt::i32::isqrt 62999.11/iter +/- 2022.05 num::int_sqrt::i64::isqrt 125248.81/iter +/- 1674.43 num::int_sqrt::i8::isqrt 123.56/iter +/- 1.87 num::int_sqrt::isize::isqrt 125356.56/iter +/- 1017.03 num::int_sqrt::non_zero_u128::isqrt 437443.75/iter +/- 3535.43 num::int_sqrt::non_zero_u16::isqrt 8604.58/iter +/- 94.76 num::int_sqrt::non_zero_u32::isqrt 62933.33/iter +/- 517.30 num::int_sqrt::non_zero_u64::isqrt 125076.38/iter +/- 11340.61 num::int_sqrt::non_zero_u8::isqrt 221.51/iter +/- 1.58 num::int_sqrt::non_zero_usize::isqrt 136005.21/iter +/- 2020.35 num::int_sqrt::u128::isqrt 439014.55/iter +/- 3920.45 num::int_sqrt::u16::isqrt 8575.08/iter +/- 148.06 num::int_sqrt::u32::isqrt 63008.89/iter +/- 803.67 num::int_sqrt::u64::isqrt 125088.09/iter +/- 879.29 num::int_sqrt::u8::isqrt 230.18/iter +/- 2.04 num::int_sqrt::usize::isqrt 125237.51/iter +/- 4747.83 ### After benchmarks: num::int_sqrt::i128::isqrt 105184.89/iter +/- 1171.38 num::int_sqrt::i16::isqrt 1910.26/iter +/- 78.50 num::int_sqrt::i32::isqrt 34260.34/iter +/- 960.84 num::int_sqrt::i64::isqrt 45939.19/iter +/- 2525.65 num::int_sqrt::i8::isqrt 22.87/iter +/- 0.45 num::int_sqrt::isize::isqrt 45884.17/iter +/- 595.49 num::int_sqrt::non_zero_u128::isqrt 106344.27/iter +/- 780.99 num::int_sqrt::non_zero_u16::isqrt 2790.19/iter +/- 53.43 num::int_sqrt::non_zero_u32::isqrt 33613.99/iter +/- 362.96 num::int_sqrt::non_zero_u64::isqrt 46235.42/iter +/- 429.69 num::int_sqrt::non_zero_u8::isqrt 31.78/iter +/- 0.75 num::int_sqrt::non_zero_usize::isqrt 46208.75/iter +/- 375.27 num::int_sqrt::u128::isqrt 106385.94/iter +/- 1649.95 num::int_sqrt::u16::isqrt 2747.69/iter +/- 28.72 num::int_sqrt::u32::isqrt 33627.09/iter +/- 475.68 num::int_sqrt::u64::isqrt 46182.29/iter +/- 311.16 num::int_sqrt::u8::isqrt 33.10/iter +/- 0.30 num::int_sqrt::usize::isqrt 46165.00/iter +/- 388.41 </details> Tracking Issue for {u8,i8,...}::isqrt rust-lang#116226 try-job: test-various

ChaiTRex · 2024-10-06T23:14:49Z

Stabilization report

API summary

i*: n.isqrt() (panics if n is negative) and n.checked_isqrt() (None if n is negative) (accepted by @dtolnay without ACP)
u*: n.isqrt() (accepted by @dtolnay without ACP)
NonZeroU*: n.isqrt() (ACP accepted)
all added methods are const fn
fairly speedy implementations of those methods (available in the latest nightly compiler)

`const fn` concerns

If a better implementation comes along that can't be const fn, const_eval_select can be used to combine the better implementation with the current implementation to maintain const fn.

Experience report

These projects use the isqrt feature (note that only five pages of results are allowed, so there may be more uses):

Uses of #![feature(isqrt)]
Uses of .isqrt()
Uses of .checked_isqrt() (right now, there appear to be very few uses that aren't rust-lang/rust forks)

Standard libraries of programming languages

The following languages have an integer square root function in the standard library: Chapel, Common Lisp, Crystal, Java (BigInteger), Julia, Maple, Python 3, Racket, Ruby, SageMath, Scheme, and Tcl.

Rust crates

integer-sqrt (6.3 million downloads of all versions; GitHub search)
num-bigint (133 million downloads of all versions)
rug (590 thousand downloads of all versions)

Other languages

GMP library

Implementation history

Enhancement request made during Rust 1.55: [ER] isqrt() function for integral values #89273
isqrt and checked_isqrt methods added to primitive types during 1.74 nightly: Add "integer square root" method to integer primitive types #116176
ACP accepted for isqrt method on unsigned NonZero types during 1.80 nightly: ACP: isqrt for NonZero<uN> libs-team#391
isqrt method added to unsigned NonZero types during 1.81 nightly: Add isqrt to NonZero<uN> #126199
Sped up all relevant methods during 1.82 nightly: Improved checked_isqrt and isqrt methods #128166

Performance

The current implementation is based on a lookup table for 8-bit integers and "unrolled" recursive Karatsuba square root for larger types. The performance is decent, but it can be beaten with f32::sqrt and f64::sqrt for 16-bit through 64-bit integers (64 bit needs a slight adjustment to correct for imprecision). Even 128-bit integers could be sped up by using Karatsuba square root combined with the 64-bit floating point-based algorithm.

It isn't done that way because:

Rust has no runtime detection of fast floating point hardware square root instructions (software-based floating point square roots are likely to be slower than the current implementation).
Rust has no way to allow the Linux kernel and similar projects to forbid floating point arithmetic and to communicate that to core library code via a cfg setting or similar.

Ignoring floating point implementations, there are also faster algorithms that use only integer operations (for example the GMP library has faster implementations). I'm not a lawyer, but I believe that porting those to Rust would be considered a derivative work, and thus require adherence to the original code's license. Since the Rust standard library appears to be dual licensed under Apache 2.0 and MIT and since GMP is licensed under both LGPL 2 and LGPL 3, we'd have to complicate our licensing situation to port the GMP code.

Is there anything else that needs to be taken care of before the isqrt feature is stabilized?

jieyouxu · 2024-10-06T23:27:34Z

cc @rust-lang/wg-const-eval for constness of isqrt above.

RalfJung · 2024-10-07T16:22:23Z

Please prepare the stabilization PR, the diff required to make that build will then tell us whether anything interesting is going on on the const side.

ChaiTRex · 2024-10-08T04:35:31Z

@RalfJung Stabilization PR is now at #131391.

m-ou-se · 2024-10-15T15:13:56Z

@rfcbot merge

See #116226 (comment)

rfcbot · 2024-10-15T15:13:58Z

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2024-10-18T08:54:23Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2024-10-28T08:54:55Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Rollup merge of rust-lang#131391 - ChaiTRex:isqrt, r=scottmcm,tgross35 Stabilize `isqrt` feature Stabilizes the `isqrt` feature. FCP is incomplete. Closes rust-lang#116226

[1] has been stabilized so we no longer need to enable it. [1]: rust-lang/rust#116226

Stabilize `isqrt` feature Stabilizes the `isqrt` feature. FCP is incomplete. Closes rust-lang#116226

FedericoStra added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 28, 2023

FedericoStra mentioned this issue Sep 28, 2023

Add "integer square root" method to integer primitive types #116176

Merged

ChaiTRex mentioned this issue Oct 3, 2023

Needed: cfg(target_feature) support for hardware floating point detection. #64514

Open

This was referenced Jun 9, 2024

Add isqrt to NonZero<uN> #126199

Merged

ACP: isqrt for NonZero<uN> rust-lang/libs-team#391

Closed

ChaiTRex mentioned this issue Jul 25, 2024

Improved checked_isqrt and isqrt methods #128166

Merged

ChaiTRex mentioned this issue Oct 6, 2024

FCP WIP ChaiTRex/rust#1

Closed

ChaiTRex mentioned this issue Oct 8, 2024

Stabilize isqrt feature #131391

Merged

workingjubilee added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Oct 8, 2024

Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Oct 22, 2024

rustbot mentioned this issue Oct 28, 2024

Tracking issue for release notes of #116226: Tracking Issue for {u8,i8,...}::isqrt #132257

Open

3 tasks

bors closed this as completed in 81d885b Oct 28, 2024

tgross35 added a commit to tgross35/compiler-builtins that referenced this issue Oct 30, 2024

Remove the unneeded isqrt feature gate

02e939b

[1] has been stabilized so we no longer need to enable it. [1]: rust-lang/rust#116226

apiraino removed the to-announce Announce this issue on triage meeting label Oct 31, 2024

flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 7, 2024

Rollup merge of rust-lang#131391 - ChaiTRex:isqrt, r=scottmcm,tgross35

303ab51

Stabilize `isqrt` feature Stabilizes the `isqrt` feature. FCP is incomplete. Closes rust-lang#116226

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue for `{u8,i8,...}::isqrt` #116226

Tracking Issue for `{u8,i8,...}::isqrt` #116226

FedericoStra commented Sep 28, 2023 •

edited by compiler-errors

Loading

ChaiTRex commented Sep 29, 2023 •

edited

Loading

ChaiTRex commented Sep 30, 2023 •

edited

Loading

ryanavella commented Jan 17, 2024

ChaiTRex commented Jan 25, 2024 •

edited

Loading

tfpf commented Feb 4, 2024

ryanavella commented Feb 4, 2024

tfpf commented Feb 16, 2024

ryanavella commented Feb 16, 2024

tfpf commented Feb 16, 2024 •

edited

Loading

ChaiTRex commented Feb 18, 2024 •

edited

Loading

ChaiTRex commented Feb 18, 2024 •

edited

Loading

tgross35 commented May 23, 2024

ChaiTRex commented May 23, 2024

tgross35 commented May 23, 2024

kennytm commented Jun 3, 2024

ChaiTRex commented Oct 6, 2024 •

edited

Loading

jieyouxu commented Oct 6, 2024

RalfJung commented Oct 7, 2024

ChaiTRex commented Oct 8, 2024

m-ou-se commented Oct 15, 2024

rfcbot commented Oct 15, 2024 •

edited by joshtriplett

Loading

rfcbot commented Oct 18, 2024

rfcbot commented Oct 28, 2024

Tracking Issue for {u8,i8,...}::isqrt #116226

Tracking Issue for {u8,i8,...}::isqrt #116226

Comments

FedericoStra commented Sep 28, 2023 • edited by compiler-errors Loading

Public API

Steps / History

Unresolved Questions

Footnotes

ChaiTRex commented Sep 29, 2023 • edited Loading

ChaiTRex commented Sep 30, 2023 • edited Loading

ryanavella commented Jan 17, 2024

ChaiTRex commented Jan 25, 2024 • edited Loading

tfpf commented Feb 4, 2024

ryanavella commented Feb 4, 2024

tfpf commented Feb 16, 2024

ryanavella commented Feb 16, 2024

tfpf commented Feb 16, 2024 • edited Loading

ChaiTRex commented Feb 18, 2024 • edited Loading

ChaiTRex commented Feb 18, 2024 • edited Loading

tgross35 commented May 23, 2024

ChaiTRex commented May 23, 2024

tgross35 commented May 23, 2024

kennytm commented Jun 3, 2024

ChaiTRex commented Oct 6, 2024 • edited Loading

Stabilization report

API summary

const fn concerns

Experience report

Standard libraries of programming languages

Rust crates

Other languages

Implementation history

Performance

jieyouxu commented Oct 6, 2024

RalfJung commented Oct 7, 2024

ChaiTRex commented Oct 8, 2024

m-ou-se commented Oct 15, 2024

rfcbot commented Oct 15, 2024 • edited by joshtriplett Loading

rfcbot commented Oct 18, 2024

rfcbot commented Oct 28, 2024

Tracking Issue for `{u8,i8,...}::isqrt` #116226

Tracking Issue for `{u8,i8,...}::isqrt` #116226

FedericoStra commented Sep 28, 2023 •

edited by compiler-errors

Loading

ChaiTRex commented Sep 29, 2023 •

edited

Loading

ChaiTRex commented Sep 30, 2023 •

edited

Loading

ChaiTRex commented Jan 25, 2024 •

edited

Loading

tfpf commented Feb 16, 2024 •

edited

Loading

ChaiTRex commented Feb 18, 2024 •

edited

Loading

ChaiTRex commented Feb 18, 2024 •

edited

Loading

ChaiTRex commented Oct 6, 2024 •

edited

Loading

`const fn` concerns

rfcbot commented Oct 15, 2024 •

edited by joshtriplett

Loading