Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RefCell complains two mutable borrows present when only one exist on armv7-unknown-linux-gnueabihf #40593

Closed
MJDSys opened this issue Mar 17, 2017 · 24 comments · Fixed by #40779
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@MJDSys
Copy link
Contributor

MJDSys commented Mar 17, 2017

When running the program at: https://is.gd/XHVVMt on my Chromebook (an Asus C201p) with the latest nightly compiler, I get the following error (RUST_BACKTRACE=1):

thread 'main' panicked at 'already borrowed: BorrowMutError', /checkout/src/libcore/result.rs:860
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: <core::result::Result<T, E>>::expect
             at /checkout/src/libcore/result.rs:761
   1: <core::cell::RefCell<T>>::borrow_mut
             at /checkout/src/libcore/cell.rs:740
   2: cell_test::main
             at ./src/main.rs:9

This doesn't occur on the latest stable compiler, and started sometime around March 12. This doesn't happen on my AMD 64bit desktop, nor on playground. Both debug and release builds are affected.

Code:

use std::cell::RefCell;

fn main() {
    let m = RefCell::new(5);
    {
        m.borrow_mut();
    }
    {
        m.borrow_mut();
    }
}

(note the extra scopes don't affect the behaviour.)

@MJDSys MJDSys changed the title RefCell complains two mutable borrows present when one exist on armv7-unknown-linux-gnueabihf RefCell complains two mutable borrows present when only one exist on armv7-unknown-linux-gnueabihf Mar 17, 2017
@pgerber
Copy link
Contributor

pgerber commented Mar 17, 2017

This may be related to rust-openssl/issues/597.

It would appear like dropping of values is broken in the nightly and beta toolchain:

This works:

let mutex = Mutex::new(());
let mut guard = None;
guard = Some(mutex.lock().unwrap());
drop(guard.take().unwrap());
mutex.try_lock().unwrap();     // locks as expected

but this fails:

let mutex = Mutex::new(());
let mut guard = None;
guard = Some(mutex.lock().unwrap());
drop(guard.take());      // unwrap() removed
mutex.try_lock().unwrap();    // panics "WouldBlock"

This seem a rather new issue, in nightly-2017-03-01 it still worked. (May also work in few of the newer versions, haven't had time to try all of them.)

@pgerber
Copy link
Contributor

pgerber commented Mar 17, 2017

Tried a few more nightly snapshots nightly-2017-03-04 is the first affected version, nightly-2017-03-03 works flawlessly.

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 17, 2017

@pgerber Where can you find the nightly snapshots? I tried downloading them with rustup.rs, but it said older ones couldn't be found.

@pgerber
Copy link
Contributor

pgerber commented Mar 17, 2017

@MJDSys This worked for me:

$ rustup toolchain install nightly-2017-03-03
$ rustup run nightly-2017-03-03 cargo …

@arielb1
Copy link
Contributor

arielb1 commented Mar 17, 2017

#40133 looks like the main diff. between the 2 versions. Self-assigning.

@arielb1 arielb1 added I-wrong regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Mar 17, 2017
@arielb1 arielb1 self-assigned this Mar 17, 2017
@sfackler sfackler added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 17, 2017
@arielb1
Copy link
Contributor

arielb1 commented Mar 19, 2017

@MJDSys

I can't reproduce this on 5d0be0d (03-11) on debian qemu. Could you upload an affected --emit=asm output?

@arielb1
Copy link
Contributor

arielb1 commented Mar 19, 2017

The first example is optimized out completely on my machine (using xargo run --target=armv7-unknown-linux-gnueabihf --release), so it is unlikely to have issues.

@pgerber
Copy link
Contributor

pgerber commented Mar 19, 2017

I get BorrowMutError on my machine too. Here's the ASM output: main.s. Compiled using 4853584 (2017-03-18).

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 19, 2017

@arielb1 Did you run that on an armv7 machine, or on a x86(_64) machine? It seems something really weird is happening in the compiler.

I'm working with a simpler example (see end of this post). A compiler on x86_64 produces a MIR representation that seems to make sense (dropping the option, and then the inner type). However with an armv7 compiler, it doesn't drop the option, but tries to drop the T (while inside the option). I'm not familiar with what the correct representation is, but that is the big difference I see on armv7.

And just to be clear, I need to use a native compiler to see that difference. Cross-compiling the source file does not produce the difference.

If it would help, I'll try to post the different outputs from MIR comparing before/after March 4 between x86_64 and armv7. I also don't think it's an issue in the MIR optimization passes, as when I dump the MIR using -Z dump-mir=all, that difference is always present.

I'm doing all my testing right now with debug builds, but my initial testing before reporting the bug happened with both debug and release.

New example:

#[deny(warnings)]

struct T{}

impl Drop for T {
        fn drop(&mut self) {
                //println!("dropped");
                panic!(false);
        }
}

fn main() {
    {
        Some(T{});
    }
}

@arielb1
Copy link
Contributor

arielb1 commented Mar 19, 2017

I am using an ARM compiler (rustc 1.17.0-nightly (b1e3176 2017-03-03) aka the Mar 4 nightly) in an x86_64 machine using qemu, which seems not to reproduce the bug. I don't have an ARM machine handy right now,

@arielb1
Copy link
Contributor

arielb1 commented Mar 19, 2017

Ah. The problem was that I was using the arm-unknown-linux-gnueabi nightlies instead of the armv7-unknown-linux-gnueabi nightlies. Only the armv7 nightlies are broken/miscompiled.

Correct "at-typeck" MIR looks like this:

// MIR for `main`
// node_id = 18
// pass_name = TypeckMir
// disambiguator = before

fn main() -> () {
    let mut _0: ();                      // return pointer
    let mut _1: std::option::Option<T>;
    let mut _2: T;
    let mut _3: ();

    bb0: {
        StorageLive(_2);                 // scope 0 at xfail.rs:14:14: 14:17
        _2 = T;                          // scope 0 at xfail.rs:14:14: 14:17
        _1 = std::option::Option<T>::Some(_2,); // scope 0 at xfail.rs:14:9: 14:18
        drop(_1) -> [return: bb3, unwind: bb2]; // scope 0 at xfail.rs:14:9: 14:18
    }

    bb1: {
        resume;                          // scope 0 at xfail.rs:12:1: 16:2
    }

    bb2: {
        drop(_2) -> bb1;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb3: {
        drop(_2) -> bb4;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb4: {
        StorageDead(_2);                 // scope 0 at xfail.rs:14:18: 14:18
        _0 = ();                         // scope 0 at xfail.rs:13:5: 15:6
        return;                          // scope 0 at xfail.rs:16:2: 16:2
    }
}
rustc.node18.TypeckMir.before.mir (END)

Incorrect "at-typeck" MIR looks like this:

// MIR for `main`
// node_id = 18
// pass_name = TypeckMir
// disambiguator = before

fn main() -> () {
    let mut _0: ();                      // return pointer
    let mut _1: std::option::Option<T>;
    let mut _2: T;
    let mut _3: ();

    bb0: {
        StorageLive(_2);                 // scope 0 at xfail.rs:14:14: 14:17
        _2 = T;                          // scope 0 at xfail.rs:14:14: 14:17
        _1 = std::option::Option<T>::Some(_2,); // scope 0 at xfail.rs:14:9: 14:18
        drop(_2) -> bb1;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb1: {
        StorageDead(_2);                 // scope 0 at xfail.rs:14:18: 14:18
        _0 = ();                         // scope 0 at xfail.rs:13:5: 15:6
        return;                          // scope 0 at xfail.rs:16:2: 16:2
    }
}

@arielb1
Copy link
Contributor

arielb1 commented Mar 19, 2017

Looking at direct builder output:

BAD:

// MIR for `main`
// node_id = 18
// pass_name = SimplifyCfg
// disambiguator = initial-before

fn main() -> () {
    let mut _0: ();                      // return pointer
    let mut _1: std::option::Option<T>;
    let mut _2: T;
    let mut _3: ();

    bb0: {
        StorageLive(_2);                 // scope 0 at xfail.rs:14:14: 14:17
        _2 = T;                          // scope 0 at xfail.rs:14:14: 14:17
        _1 = std::option::Option<T>::Some(_2,); // scope 0 at xfail.rs:14:9: 14:18
        drop(_2) -> bb3;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb1: {
        resume;                          // scope 0 at xfail.rs:12:1: 16:2
    }

    bb2: {
        drop(_2) -> bb1;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb3: {
        StorageDead(_2);                 // scope 0 at xfail.rs:14:18: 14:18
        _0 = ();                         // scope 0 at xfail.rs:13:5: 15:6
        goto -> bb4;                     // scope 0 at xfail.rs:16:2: 16:2
    }

    bb4: {
        return;                          // scope 0 at xfail.rs:16:2: 16:2
    }
}

GOOD:

// MIR for `main`
// node_id = 18
// pass_name = SimplifyCfg
// disambiguator = initial-before

fn main() -> () {
    let mut _0: ();                      // return pointer
    let mut _1: std::option::Option<T>;
    let mut _2: T;
    let mut _3: ();

    bb0: {
        StorageLive(_2);                 // scope 0 at xfail.rs:14:14: 14:17
        _2 = T;                          // scope 0 at xfail.rs:14:14: 14:17
        _1 = std::option::Option<T>::Some(_2,); // scope 0 at xfail.rs:14:9: 14:18
        drop(_1) -> [return: bb3, unwind: bb2]; // scope 0 at xfail.rs:14:9: 14:18
    }

    bb1: {
        resume;                          // scope 0 at xfail.rs:12:1: 16:2
    }

    bb2: {
        drop(_2) -> bb1;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb3: {
        drop(_2) -> bb4;                 // scope 0 at xfail.rs:14:18: 14:18
    }

    bb4: {
        StorageDead(_2);                 // scope 0 at xfail.rs:14:18: 14:18
        _0 = ();                         // scope 0 at xfail.rs:13:5: 15:6
        goto -> bb5;                     // scope 0 at xfail.rs:16:2: 16:2
    }

    bb5: {
        return;                          // scope 0 at xfail.rs:16:2: 16:2
    }
}

@arielb1
Copy link
Contributor

arielb1 commented Mar 20, 2017

debugging. It seems that after #40178, type contents is miscompiled somehow.

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 20, 2017

@arielb1 I've tried bisecting the problematic change and came up with the same commit causing the issue, but at the same time reverting the change on top of master (4853584) didn't fix it. I'll keep trying poke things here.

@arielb1
Copy link
Contributor

arielb1 commented Mar 20, 2017

Reverting things did fix things for me. This is however, a miscompile, so problems are to be expected.

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 20, 2017

Did you revert it on top of master, or on top of the nightly build from March 4th?

@arielb1
Copy link
Contributor

arielb1 commented Mar 20, 2017

On top of a559452

@arielb1
Copy link
Contributor

arielb1 commented Mar 21, 2017

Looks like an LLVM bug, unless we have some "hidden" UB somewhere:

For some reason, this IR:

...
  %128 = and i32 %_8.sroa.0.0.copyload.i.i, 8, !dbg !1314345
  %129 = icmp ne i32 %128, 0, !dbg !1314336
  %130 = and i64 %accum.sroa.0.0.i.i.lcssa, -33, !dbg !1314349
  %.accum.sroa.0.0.i.i = select i1 %129, i64 %130, i64 %accum.sroa.0.0.i.i.lcssa, !dbg !1314350
...
  %134 = call zeroext i1 @_ZN5rustc2ty6AdtDef8has_dtor17hfb2c4f039e32f892E(%"ty::AdtDef"* noalias nonnull readonly dereferenceable(32) %98, %"ty::context::TyCtxt"* noalias nocapture nonnull dereferenceable(8) %arg16), !dbg !1314329
  %135 = or i64 %.accum.sroa.0.0.i.i, 32, !dbg !1314354
  %res.sroa.0.1 = select i1 %134, i64 %135, i64 %.accum.sroa.0.0.i.i, !dbg !1314329

Lowers the last call/or/select as

src/librustc/ty/contents.rs:222
  272b70:       ebf7edab        bl      6e224 <_ZN5rustc2ty6AdtDef8has_dtor17hfb2c4f039e32f892E@plt>
src/librustc/ty/contents.rs:226
  272b74:       e59d1014        ldr     r1, [sp, #20]
_ZN11collections3vec8{{impl}}54deref<core::option::Option<rustc::hir::def_id::DefId>>E():
src/libcollections/vec.rs:1546
  272b78:       e5912404        ldr     r2, [r1, #1028] ; 0x404
_ZN11collections3vec8{{impl}}54index<core::option::Option<rustc::hir::def_id::DefId>>E():
src/libcollections/vec.rs:1426
  272b7c:       e3520032        cmp     r2, #50 ; 0x32
  272b80:       9a0000a1        bls     272e0c <_ZN5rustc2ty8contents48_$LT$impl$u20$rustc..ty..TyS$LT$$u27$tcx$GT$$GT$13type_contents5tc_ty17h8a564cc8cbe4b939E+0x59c>
_ZN5rustc2ty8contents8{{impl}}13type_contents5tc_tyE():
src/librustc/ty/contents.rs:222
  272b84:       e7c56290        bfi     r6, r0, #5, #1

And a bfi is not a very good select. Oops.

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 21, 2017

@arielb1 Could you post the entire file, or give me steps to generate/investigate it locally? I'm new to playing with rustc's codebase and would like to take a deeper look/follow along at home.

@arielb1
Copy link
Contributor

arielb1 commented Mar 22, 2017

@MJDSys

This looks like an LLVM bug, not a rustc bug.

@MJDSys
Copy link
Contributor Author

MJDSys commented Mar 22, 2017

@arielb1 That's fine, I'm just trying to learn more about the stack. If you have a reference for how you got that, I'm happy to read that as well.

@arielb1
Copy link
Contributor

arielb1 commented Mar 22, 2017

Minified LLVM bug - will report to LLVM after seeing the status on 4.0:

; ModuleID = '<stdin>'
source_filename = "rustc.cgu-0.rs"
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-unknown-linux-gnueabihf"

define fastcc i64 @test_function(i64, i1 zeroext, i1 zeroext) {
entry-block:
  %cleared = and i64 %0, -3
  %maybe_cleared = select i1 %1, i64 %cleared, i64 %0

  %set = or i64 %maybe_cleared, 2
  %result = select i1 %2, i64 %set, i64 %maybe_cleared
  ret i64 %result
}

@arielb1
Copy link
Contributor

arielb1 commented Mar 22, 2017

Reported as http://bugs.llvm.org/show_bug.cgi?id=32379.

@brson brson added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 23, 2017
@nikomatsakis
Copy link
Contributor

triage: P-high

@rust-highfive rust-highfive added the P-high High priority label Mar 23, 2017
arielb1 added a commit to arielb1/rust that referenced this issue Mar 23, 2017
bors added a commit that referenced this issue Mar 24, 2017
update LLVM with fix for PR32379

Fixes #40593.

The "root" codegen bug fixed here is that, when generating ARM code, unpatched LLVM 3.9/3.9.1 miscompiles bit operations in rare circumstances - this can cause user code compiled via LLVM (through both `rustc` and `clang`) to subtly return incorrect results - for more details, see the test in this PR or in the LLVM rare report.

One effect of that LLVM bug is that `rustc` 1.17 (and possibly other versions) is miscompiled on ARM. The code generated by a miscompiled `rustc` lacks destructor calls in many circumstances.

Users who run an affected/miscompiled `rustc` - 1.17 or above - on an ARM build machine will be affected by the (fairly blatant) missing destructor bug, regardless of the target architecture (this includes the official `1.17.0-beta.1`, `1.17.0-beta.2`, and some official 1.17/1.18 nightlies).

Users who use an affected LLVM (that's any unpatched LLVM 3.9/3.9.1), whether through `rustc` (in any version that supports 3.9 - that's 1.12 or above) or through `clang`, who compile code to an ARM target architecture might be affected by the (fairly hard to hit) bit operation bug, regardless of the build machine.

Distributors and user who want to compile rustc using their own LLVM should apply the [patch](llvm-mirror/llvm@cdc303e) to avoid miscompilations.

r? @alexcrichton
Beta-nominating because regression (rustc 1.16 is not blatantly miscompiled). This also picks a fix for the (MSVC-affecting) PR29151.
alexcrichton pushed a commit to alexcrichton/rust that referenced this issue Apr 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants