-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad performance of byte and integer to_string() vs str literal to_string() #73533
Comments
Oh, looks like we don't know if the Line 193 in 033013c
Looking at which Line 210 in 033013c
I also don't think we can parse 4 characters at a time for |
I have a rough plan but it isn't quite clear. |
Maybe we could also try porting the go version in https://golang.org/src/strconv/itoa.go?s=1028:1051#L24, and then see which one is the fastest implementation. |
I have sort of a patch but I don't know how long the benchmark will run with I wrote a patch to use different stack size to parse different type of integer, not sure if this will improve parsing performance. Patchdiff --git a/src/libcore/fmt/num.rs b/src/libcore/fmt/num.rs
index 7d77e33d743..1457da8c667 100644
--- a/src/libcore/fmt/num.rs
+++ b/src/libcore/fmt/num.rs
@@ -187,10 +187,10 @@ static DEC_DIGITS_LUT: &[u8; 200] = b"0001020304050607080910111213141516171819\
8081828384858687888990919293949596979899";
macro_rules! impl_Display {
- ($($t:ident),* as $u:ident via $conv_fn:ident named $name:ident) => {
+ ($($t:ident),* as $u:ident via $conv_fn:ident named $name:ident max $max_len:literal) => {
fn $name(mut n: $u, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- // 2^128 is about 3*10^38, so 39 gives an extra byte of space
- let mut buf = [MaybeUninit::<u8>::uninit(); 39];
+ // 2^bits is about $max_len, so 39 gives an extra byte of space
+ let mut buf = [MaybeUninit::<u8>::uninit(); $max_len];
let mut curr = buf.len() as isize;
let buf_ptr = MaybeUninit::first_ptr_mut(&mut buf);
let lut_ptr = DEC_DIGITS_LUT.as_ptr();
@@ -198,28 +198,30 @@ macro_rules! impl_Display {
// SAFETY: Since `d1` and `d2` are always less than or equal to `198`, we
// can copy from `lut_ptr[d1..d1 + 1]` and `lut_ptr[d2..d2 + 1]`. To show
// that it's OK to copy into `buf_ptr`, notice that at the beginning
- // `curr == buf.len() == 39 > log(n)` since `n < 2^128 < 10^39`, and at
- // each step this is kept the same as `n` is divided. Since `n` is always
- // non-negative, this means that `curr > 0` so `buf_ptr[curr..curr + 1]`
- // is safe to access.
+ // `curr == buf.len() == $max_len > log(n)` since `n < 2^128 < 10^bits`,
+ // and at each step this is kept the same as `n` is divided. Since `n`
+ // is always non-negative, this means that `curr > 0` so
+ // `buf_ptr[curr..curr + 1]` is safe to access.
unsafe {
// need at least 16 bits for the 4-characters-at-a-time to work.
assert!(crate::mem::size_of::<$u>() >= 2);
// eagerly decode 4 characters at a time
- while n >= 10000 {
- let rem = (n % 10000) as isize;
- n /= 10000;
-
- let d1 = (rem / 100) << 1;
- let d2 = (rem % 100) << 1;
- curr -= 4;
-
- // We are allowed to copy to `buf_ptr[curr..curr + 3]` here since
- // otherwise `curr < 0`. But then `n` was originally at least `10000^10`
- // which is `10^40 > 2^128 > n`.
- ptr::copy_nonoverlapping(lut_ptr.offset(d1), buf_ptr.offset(curr), 2);
- ptr::copy_nonoverlapping(lut_ptr.offset(d2), buf_ptr.offset(curr + 2), 2);
+ if $max_len > 4 {
+ while n >= 10000 {
+ let rem = (n % 10000) as isize;
+ n /= 10000;
+
+ let d1 = (rem / 100) << 1;
+ let d2 = (rem % 100) << 1;
+ curr -= 4;
+
+ // We are allowed to copy to `buf_ptr[curr..curr + 3]` here since
+ // otherwise `curr < 0`. But then `n` was originally at least `10000^10`
+ // which is `10^40 > 2^128 > n`.
+ ptr::copy_nonoverlapping(lut_ptr.offset(d1), buf_ptr.offset(curr), 2);
+ ptr::copy_nonoverlapping(lut_ptr.offset(d2), buf_ptr.offset(curr + 2), 2);
+ }
}
// if we reach here numbers are <= 9999, so at most 4 chars long
@@ -440,10 +442,10 @@ macro_rules! impl_Exp {
#[cfg(any(target_pointer_width = "64", target_arch = "wasm32"))]
mod imp {
use super::*;
- impl_Display!(
- i8, u8, i16, u16, i32, u32, i64, u64, usize, isize
- as u64 via to_u64 named fmt_u64
- );
+ impl_Display!(i8, u8 as u64 via to_u64 named fmt_u8 max 3);
+ impl_Display!(i16, u16 as u64 via to_u64 named fmt_u16 max 5);
+ impl_Display!(i32, u32 as u64 via to_u64 named fmt_u32 max 10);
+ impl_Display!(i64, u64, usize, isize as u64 via to_u64 named fmt_u64 max 20);
impl_Exp!(
i8, u8, i16, u16, i32, u32, i64, u64, usize, isize
as u64 via to_u64 named exp_u64
@@ -453,11 +455,13 @@ mod imp {
#[cfg(not(any(target_pointer_width = "64", target_arch = "wasm32")))]
mod imp {
use super::*;
- impl_Display!(i8, u8, i16, u16, i32, u32, isize, usize as u32 via to_u32 named fmt_u32);
- impl_Display!(i64, u64 as u64 via to_u64 named fmt_u64);
+ impl_Display!(i8, u8 as u32 via to_u32 named fmt_u8 max 3);
+ impl_Display!(i16, u16 as u32 via to_u32 named fmt_u16 max 5);
+ impl_Display!(i32, u32, isize, usize as u32 via to_u32 named fmt_u32 max 10);
+ impl_Display!(i64, u64 as u64 via to_u64 named fmt_u64 max 20);
impl_Exp!(i8, u8, i16, u16, i32, u32, isize, usize as u32 via to_u32 named exp_u32);
impl_Exp!(i64, u64 as u64 via to_u64 named exp_u64);
}
-impl_Display!(i128, u128 as u128 via to_u128 named fmt_u128);
+impl_Display!(i128, u128 as u128 via to_u128 named fmt_u128 max 39);
impl_Exp!(i128, u128 as u128 via to_u128 named exp_u128); @lzutao How do I know if it is faster quickly? |
I may build stage1 rustc and run your benchmark. Also comparing with result of master rustc. |
@lzutao How do you do that? I use |
The hard code number is not bad but I would go with creating a |
I do *: with |
But rust does not have numeric limits right? So I need to add something like |
Yeah, my plan is to add that trait to Rust. But you are free to hardcode that number to macro |
|
My mistake |
Btw, why did you want to run bench on |
@lzutao I am thinking of trying out the benchmark first and then tweak it, later only add it as a trait, but I don't know if trait can carry value. But still, making all of what you say when it isn't faster isn't very useful for me.
Because I want to test out if EDIT: @lzutao By the way, it's |
It is a brick of the procedure. You might interest in reading the source code of to_string function in C++. |
Thanks, that's very helpful. I was looking for that here and there but I cannot find it. |
@lzutao But how should I proceed with this? The method you specified does not seemed to work for libcore with a bunch of errors. My idea is work it out into two parts, first identify if this speeds up integer to string conversion, second work out the numeric limits you mentioned. But I don't know if it is faster, but still I will try running benchmark here, but it will probably take a long time. |
You shouldn't have to. That is my very long term/rough plan and I may have to write an RFC/demo to convince the compiler team.
Probably writing a micro benchmark to compare C++
|
I tried running a benchmark and here are the results. Summary of
But still, I don't know why the results differs a lot, maybe because it's a compile farm. |
Based on #73462
Not sure if we can do much about integer to string but I bet we can do something for byte to string.
(note that char benchmark is not updated with the latest specialization update by @lzutao)
The text was updated successfully, but these errors were encountered: