Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use checked arithmetic by default #9469

Closed
brson opened this issue Sep 24, 2013 · 23 comments
Closed

Use checked arithmetic by default #9469

brson opened this issue Sep 24, 2013 · 23 comments
Labels
A-codegen Area: Code generation
Milestone

Comments

@brson
Copy link
Contributor

brson commented Sep 24, 2013

There's a lot of concern that having +, -, and * overflow by default is incorrect - it's a source of many security vulnerabilities. Let's try changing the default to being checked and measure the impact on performance and code size.

There are some open questions:

  • what happens on overflow? Probably fail!. It could also raise a condition, but having codegen raise conditions is a big step to take, and we're not sure we like conditions.
  • how do you turn it off? Either with unchecked blocks, explicit methods, or additional types.

Nominating well defined.

@thestinger
Copy link
Contributor

You can try using it via the checked arithmetic traits I implemented, and there's a huge overhead in terms of both performance and code size. LLVM is almost never capable of removing the checks, just like it rarely actually removes array bounds checks. It makes whole passes like loop and scalar vectorization useless.

Failing on overflow doesn't make the incorrect code any more correct, since a failure is still going to be a bug in 99% of programs. I feel strongly that the correct solution is to make integer literals not default to a signed fixed-size integer, and making generic literals work for big integers.

The most common integer type for application logic could be the one proposed in #8635.

@thestinger
Copy link
Contributor

Here's an optimized IR comparison, to give an idea of the code generation when it can't eliminate the check. In these examples, it can't eliminate it because there are inputs to a function (it must inline deep enough to see the values or range of values, to have a hope of eliminating them):

fn foo(x: u32, y: u32) -> u32 {
    x + y
}
; Function Attrs: nounwind readnone uwtable
define i32 @_ZN3foo17h73f0a05a6ca431af4v0.0E({ i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone, i32, i32) #2 {
"function top level":
  %3 = add i32 %2, %1
  ret i32 %3
}

Checked operation, calling fail on overflow:

fn bar(x: u32, y: u32) -> u32 {
    x.checked_add(&y).unwrap()
}
; Function Attrs: uwtable
define i32 @_ZN3bar17h73f0a05a6ca431av4v0.0E({ i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone, i32, i32) #3 {
"function top level":
  %3 = alloca %str_slice, align 8
  %4 = alloca %str_slice, align 8
  %5 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %1, i32 %2) #1
  %6 = extractvalue { i32, i1 } %5, 1
  br i1 %6, label %match_else.i, label %_ZN6option6Option6unwrap21h1283b974be2d27c1lBau4v0.0E.exit

match_else.i:                                     ; preds = %"function top level"
  %7 = bitcast %str_slice* %3 to i8*
  call void @llvm.lifetime.start(i64 16, i8* %7)
  %8 = bitcast %str_slice* %4 to i8*
  call void @llvm.lifetime.start(i64 16, i8* %8)
  %9 = getelementptr inbounds %str_slice* %3, i64 0, i32 0
  store i8* getelementptr inbounds ([44 x i8]* @str1262, i64 0, i64 0), i8** %9, align 8
  %10 = getelementptr inbounds %str_slice* %3, i64 0, i32 1
  store i64 43, i64* %10, align 8
  %11 = getelementptr inbounds %str_slice* %4, i64 0, i32 0
  store i8* getelementptr inbounds ([46 x i8]* @str1263, i64 0, i64 0), i8** %11, align 8
  %12 = getelementptr inbounds %str_slice* %4, i64 0, i32 1
  store i64 45, i64* %12, align 8
  call void @"_ZN3sys28FailWithCause$__extensions__9fail_with20hdb4c44d01ce41167qaF4v0.8E"({}* sret undef, { i64, %tydesc*, i8*, i8*, i8 }* undef, %str_slice* %3, %str_slice* %4, i64 323)
  unreachable

_ZN6option6Option6unwrap21h1283b974be2d27c1lBau4v0.0E.exit: ; preds = %"function top level"
  %13 = extractvalue { i32, i1 } %5, 0
  %14 = bitcast %str_slice* %3 to i8*
  call void @llvm.lifetime.start(i64 16, i8* %14)
  %15 = bitcast %str_slice* %4 to i8*
  call void @llvm.lifetime.start(i64 16, i8* %15)
  call void @llvm.lifetime.end(i64 16, i8* %14)
  call void @llvm.lifetime.end(i64 16, i8* %15)
  ret i32 %13
}

@thestinger
Copy link
Contributor

Note that the function is also no longer marked readnone/nounwind so this bubbles up into the rest of the code and results in many more invoke instructions and landing pads being generated along with the call being viewed as impure.

@jfager
Copy link
Contributor

jfager commented Sep 24, 2013

I agree that if safety is the motivation using bigints by default makes more sense. Failing on overflow doesn't seem that much better than wrapping in day-to-day development. But it seems like using something like GMP is a better option than a rust impl - why fight that battle again?

I'd be disappointed if unchecked arithmetic required anything more than just specifying a fixed-width int type. I like Rust because it balances safety and control. Making unchecked arithmetic hard feels like it goes too far over to the safety side of the fence.

Fwiw this is the approach Haskell takes.

@thestinger
Copy link
Contributor

I don't there should should be a default at all, just generic literals and the choice to use what's appropriate by not falling back to int. There are cases when you want a big integer, and others where you want a flexible type expanding to one only as necessary.

@pnkfelix
Copy link
Member

cc me

@catamorphism
Copy link
Contributor

Accepted for well-defined

@UtherII
Copy link

UtherII commented Oct 13, 2013

About how to turning checked arithmetic off, would it be possible to use unsafe block but allowing to make them more specific about unsafe operations.

For example something like:

    unsafe "int_overflow" {
        // integer overflows don't fail, others unsafe operations are still forbidden
    }
    unsafe "ffi", "raw_ptr"{
        // ffi and dereferencing raw pointers allowed, others unsafe operations are still forbidden
    }

@Thiez
Copy link
Contributor

Thiez commented Oct 13, 2013

Overflow is not unsafe when you explicitly want it to occur, so I don't think it makes sense to wrap it in an 'unsafe' block.

Perhaps some new types that implement the arithmetic traits but perform checked arithmetic? Or a Checked<T> wrapper? We already have the Checked_<Op> traits defined on int.

@thestinger
Copy link
Contributor

I think a type-based solution is best. Signed/unsigned integer types using the checked operations and expanding to a big integer on overflow would make sense. I don't think there's much benefit to a type throwing a failure on an overflow, because that's still going to be a bug in the application.

Something to note is that the checked operations aren't implemented in hardware on every architecture. We need to start using compiler-rt to provide cross-platform support. A notable example is the lack of 64-bit checked multiplication on i686.

@pnkfelix
Copy link
Member

I am surprised by the statements that a fail-on-overflow is not much different from the semantic errors you get from modular-arithmetic wrapping; I think that a proper fail! on overflow is a very different development experience than an under- or overflowed value then propagating somewhere else in the application.

(But do not interpret the above as a vote against bigints; I do agree they make the most sense in a number of contexts, though I'm not yet sure how best to incorporate them.)

@thestinger what are your thoughts about how to handle underflow for an unsigned integer type? I assume you would want at least that to fail! ?

@thestinger
Copy link
Contributor

@pnkfelix: Underflow would still expand to a big integer, I was thinking it might make sense performance-wise to have coverage over different ranges before expanding to a big integer. It's probably enough to just have a single type based on int using an enum.

I don't really see much point in having an unsigned big integer type like we have in extra and I can't find precedence for it in other languages/libraries.

@pnkfelix
Copy link
Member

@thestinger Ah, I see. I had thought you had meant that an unsigned fixed-width uint would overflow to an unsigned big uint. But now I interpret your suggestion as saying that uint should over- and underflow to signed integer.

@thestinger
Copy link
Contributor

@pnkfelix: I don't want to change the semantics of the built-in integers from what they are now, but rather introduce a new type expanding to a big integer when necessary. There isn't a need to have an unsigned version, it was a silly thought.

@jfager
Copy link
Contributor

jfager commented Oct 15, 2013

@pnkfelix It is a different developer experience (fail-fast vs. fail-mysteriously-at-some-point) and if those were the only two options with no other tradeoffs fail-fast would definitely be the preferable one. But they're both still runtime failures. For either approach, your code has to be aware of and prevent overflow conditions in order to not blow up.

@thestinger's suggestion seems to make way more sense. You write code blissfully ignoring overflow issues, and overflow issues never bite you. And if you want overflow or to not pay the cost of overflow checking, then you can just go back to using standard fixed-width integer types and accept the tradeoff.

@thestinger
Copy link
Contributor

If we implemented #6023 and #4169, alternate types would be as first-class as the built-in integers (excluding constant expressions) and there wouldn't be a push from the language to use the fixed-size ones. There would be no default, as all integer types would be equal.

@tiffany352
Copy link

It'd be nice if the type system could compute the values that are valid for any given function that don't cause overflow and limit it to those, but that would require dependant types.

As an example, you could have a function which adds two numbers, and the type system would figure out that A+B can never exceed UINT_MAX, which would be expressed through the type system and enforced at each invocation. If you took one number from user input, and added it to a constant foo, the integer read from user input could only be as large as UINT_MAX-foo without returning None

@thestinger
Copy link
Contributor

Simple example:

extern mod extra;
use std::unstable::intrinsics::{abort, u32_mul_with_overflow};
use extra::test::BenchHarness;

#[inline(never)]
fn control(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x *= 5;
    }
}

#[inline(never)]
fn check(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        unsafe {
            let (y, o) = u32_mul_with_overflow(*x, 5);
            if o {
                abort()
            }
            *x = y;
        }
    }
}

#[inline(never)]
fn check_libstd(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x = x.checked_mul(&5).unwrap();
    }
}

#[bench]
fn bench_control(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        control(xs)
    });
}

#[bench]
fn bench_check(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check(xs)
    });
}

#[bench]
fn bench_check_libstd(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check_libstd(xs)
    });
}

--opt-level=2

test bench_check        ... bench:      1085 ns/iter (+/- 28)
test bench_check_libstd ... bench:      1082 ns/iter (+/- 38)
test bench_control      ... bench:       349 ns/iter (+/- 12)

--opt-level=3

test bench_check        ... bench:      1080 ns/iter (+/- 14)
test bench_check_libstd ... bench:      1177 ns/iter (+/- 16)
test bench_control      ... bench:       350 ns/iter (+/- 11)

Ouch. It becomes a larger slowdown multiplier when you add more operations to the loop too. Since it's increasing the code size a lot, it will bloat the instruction cache too.

@thestinger
Copy link
Contributor

We will be slower than Java and Haskell on numeric loops. Benchmarks of Rust will look absolutely terrible unless you're already an expert with the language and know the default integers are slow. Ignoring scalar/loop optimizations on integers in a language claiming to be aimed at replacing C++ just isn't going to work.

@brson
Copy link
Contributor Author

brson commented Feb 19, 2014

Nominating for removal from milestone.

@hatahet
Copy link
Contributor

hatahet commented Feb 19, 2014

Would it be feasible to have some CheckedNum trait, with corresponding numeric classes implementing it, and overloading the numerical operators?

@pnkfelix
Copy link
Member

there is a lot of open-research to do on this topic, so we cannot make it a high priority for 1.0, which means we cannot make it something we turn on my default.

If we want a separate opt-in trait for checked arithmetic, or a compiler flag, etc, then those are things to consider, but they should be in a separate bug.

Closing as wontfix.

@R030t1
Copy link

R030t1 commented Aug 24, 2020

A method to handle arithmetic correctness as a language feature would be extremely valuable. A program aborting due to an arithmetic error is usually better than the program silently continuing with data corruption.

It may be convenient to write this off as not worth it by virtue of being intractable at Rust's level, but finally pointing out it is a massive headache to have incomplete arithmetic units on processors may finally get somewhere. (Yes, I'm aware of the overflow flags present on basically every architecture, but that is different than generating an interrupt.)

"We Need Hardware Traps for Integer Overflows" https://blog.regehr.org/archives/1154

djkoloski pushed a commit to djkoloski/rust that referenced this issue Sep 21, 2022
Fix FormatArgsExpn parsing of FormatSpec positions

Woops, forgot visitors don't walk themselves

Fixes rust-lang#9468

r? `@giraffate`

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation
Projects
None yet
Development

No branches or pull requests

10 participants