Creating a reference with `std::mem::zeroed` generates a SIGILL #52898

koute · 2018-07-31T07:39:16Z

This program:

fn main() {
    let value: &u8 = unsafe { ::std::mem::zeroed() };
}

when compiled with opt-level higher than zero generates a SIGILL. Switching to std::mem::uninitialized works around the issue.

I've been bitten by this in generic code where I have a field with ManuallyDrop<T> (basically I use it as an Option with manually managed null state).

This is possibly related to #52875

Affected versions:
rustc 1.29.0-nightly (874dec2 2018-07-21)
rustc 1.29.0-nightly (54628c8 2018-07-30)
rustc 1.27.2 (58cc626 2018-07-18)

Not affected versions:
rustc 1.26.0 (a775680 2018-05-07)

This doesn't look architecture specific as x86_64-unknown-linux-gnu, mips64-unknown-linux-gnuabi64 and armv7-unknown-linux-musleabihf are all affected.

The text was updated successfully, but these errors were encountered:

ishitatsuyuki · 2018-07-31T07:53:05Z

As soon as you instantiate a type (including references) with a stale value it will immediately cause UB. This is not really actionable (mem::uninitialized itself is not very sound, there are proposals to use unions instead).

Mark-Simulacrum · 2018-07-31T11:46:05Z

Yes, this code is UB - references must be non-null.

ghost · 2018-08-04T20:09:52Z

I can see how things like mem::uninitialized and mem::zeroed should generally be discouraged in application code, but I find it surprising that a language marketed toward kernel writers is making it increasingly difficult to operate directly on memory. This caused a widely used library of mine to stop working, and I'm having trouble thinking of a fix that doesn't require a major rewrite that breaks backwards compatibility. Isn't the whole point of unsafe to isolate undefined behavior when it can't be reasonably avoided?

Mark-Simulacrum · 2018-08-04T20:35:01Z

You can use *const T or *mut T instead to operate on memory directly; if you've left a value zeroed or unintialized it wouldn't be safe to permit access to it: the raw pointer types do that, whereas the reference 'type' (&) does not.

If you want to, you can also wrap them in a struct to provide safe access, perhaps panicking if the value hasn't yet been initialized.

If you have suggestions as to how to improve ergonomics or clarity here (docs or through language changes) we'd be happy to hear them.

ghost · 2018-08-04T20:45:38Z

What about when working with C function pointers? I expected these to behave more like raw pointers, but it's causing this same SIGILL. Example:

let x: unsafe extern "C" fn () = mem::zeroed();

Mark-Simulacrum · 2018-08-04T20:52:53Z

You should be using Option<...> around the function pointer, and initializing it to None (equivalent to mem::zeroed, I believe, but safer).

The nomicon notes:

The most common type that takes advantage of the nullable pointer optimization is Option<T>, where None corresponds to null. So Option<extern "C" fn(c_int) -> c_int> is a correct way to represent a nullable function pointer using the C ABI (corresponding to the C type int (*)(int)).

ghost · 2018-08-04T21:08:24Z

At the time that I wrote the x11-dl crate, it didn't seem reasonable to expect users to check if a function pointer is None every time they use one when the crate ensures that they're all non-null (or Some(_)) before making them available. I was using mem::zeroed and a loop to initialize very large structs full of function pointers as an optimization that improved build times, execution times, and code size, and prevented a stack overflow from occurring on some systems. It would be reasonable if this SIGILL had existed from the start, but it's a breaking change that's very difficult to work around.

koute · 2018-08-04T21:35:48Z

One extra thing which I'd like to emphasize here is that this is an incredibly silly footgun when dealing with generic code. Yes, accessing something that is mem::zeroed or mem::uninitialized should definitely trigger undefined behavior (I don't think anyone is disputing that) and should not be done, however merely creating such a value without doing anything with it should not give the compiler a blanket permission to maliciously turn my program into a single ud2 instruction.

In fact, if the official stance is that initializing T& using mem::uninitialized() is undefined behavior (Based on what @ishitatsuyuki said it might...?) then this (written in safe Rust) program should trigger undefined behavior:

use std::collections::HashMap;

fn main() {
    let mut map: HashMap< (), &u32 > = HashMap::new();
    map.insert( (), &123 );
    for (_, v) in map.drain() {
        println!( "{}", v );
    }
}

Here's how the Drain iterator is defined:

impl<'a, K, V> Iterator for Drain<'a, K, V> {
    // ...
    #[inline]
    fn next(&mut self) -> Option<(SafeHash, K, V)> {
        self.iter.next().map(|raw| {
            unsafe {
                self.table.as_mut().size -= 1;
                let (k, v) = ptr::read(raw.pair());
                (SafeHash { hash: ptr::replace(&mut *raw.hash(), EMPTY_BUCKET) }, k, v)
            }
        })
    }
    // ...
}

And here's ptr::read:

#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub unsafe fn read<T>(src: *const T) -> T {
    let mut tmp: T = mem::uninitialized();
    copy_nonoverlapping(src, &mut tmp, 1);
    tmp
}

So what's effectively doing is this:

let (k, v): ((), &'static u32) = mem::uninitialized();

In general my assumptions were always:

As long as the value (any value) is never accessed having it initialized using mem::uninitialized is safe.
The mem::zeroed is exactly the same as mem::uninitialized, except the underlying memory is zeroed.

The docs for mem::uninitialized currently do sound like it's fine to use mem::uninitialized as long as the value isn't accessed in any way.

The docs for mem::zeroed are more spartan; they do however suggest a similarity to mem::uninitialized. If there are cases where mem::uninitialized doesn't trigger UB and mem::zeroed does then I think that definitely should be documented!

Mark-Simulacrum · 2018-08-04T21:38:02Z

I'll reopen this and cc people who are working on uninitalized/zeroed.

cc @RalfJung @rust-lang/wg-unsafe-code-guidelines

sfackler · 2018-08-04T21:57:48Z

uninitialized is very different from zeroed in this context. A null reference is explicitly a value that doesn't exist, and creating one is undefined behavior.

ghost · 2018-08-04T22:13:14Z

@sfackler Uninitialized memory is also assumed to be a value that doesn't exist, and both are equally undefined behavior. If anything, zeroed memory is more likely to trigger a SIGSEGV or equivalent on most platforms, which is far better than with uninitialized memory, where this is only the best case scenario.

More importantly, as @koute mentioned, this zeroed memory is not at any point being accessed in these use cases. It's being overwritten before any use, which should pretty much eliminate any undefined behavior. Regardless, there would be no point in providing mem::uninitialized or mem::zeroed or unsafe at all if such code is forbidden.

Why not just rewrite these functions to

unsafe fn uninitialized<T> () -> T {
    panic!("this used to be allowed but isn't anymore because ???")
}

unsafe fn zeroed<T> () -> T {
    panic!("this used to be allowed but isn't anymore because ???")
}

sfackler · 2018-08-04T22:33:00Z

Uninitialized memory is also assumed to be a value that doesn't exist, and both are equally undefined behavior.

No, that is not correct. Uninitialized memory exists all over the place! The memory returned by malloc is uninitialized. Uninitialized memory has an undefined value.

Regardless, there would be no point in providing mem::uninitialized or mem::zeroed or unsafe at all if such code is forbidden.

All such code is not forbidden.

ghost · 2018-08-04T22:37:21Z

Uninitialized memory is also assumed to be a value that doesn't exist, and both are equally undefined behavior.

No, that is not correct. Uninitialized memory exists all over the place! The memory returned by malloc is uninitialized. Uninitialized memory has an undefined value.

But mem::uninitialized tricks the compiler into believing that an uninitialized value is initialized, which is very much undefined behavior, and is the entire point of having the function in the first place.

sfackler · 2018-08-04T22:39:35Z

But mem::uninitialized tricks the compiler into believing that an uninitialized value is initialized

There is no tricking going on. The compiler is well aware that the value is uninitialized.

norru · 2018-08-04T22:49:07Z

@sfackler, @daggerbot - while the argument is sound, at the moment we have a significant amount of projects for which nightly is unusable - not mentioning that there might be pressure to push this change to stable.

Would it be reasonable to revert the change that caused this behaviour and then continue the discussion until a proper solution is found?

ghost · 2018-08-04T22:51:19Z

There is no tricking going on. The compiler is well aware that the value is uninitialized.

If this were the case, then the following code would fail to build:

fn do_a_thing (_: &u8) {}

fn main () {
    unsafe {
        let x: u8 = std::mem::uninitialized();
        do_a_thing(&x);
    }
}

But it doesn't, because the compiler thinks x is initialized.
As a counter example:

fn do_a_thing (_: &u8) {}

fn main () {
    let x: u8;
    do_a_thing(&x);
}

This does not build because of an error stating use of possibly uninitialized 'x'.

sfackler · 2018-08-04T22:55:47Z

The documentation for mem::uninitialized has some extensive discussion about what you can and can not do with a value returned by it.

The explicit purpose of mem::uninitialized is to bypass that check. It is an unsafe function, and the entire purpose of unsafe code is to handle things that the compiler cannot. An unsafe block says "trust me, I know what I'm doing". If you don't actually know what you're doing, then that's on you.

sfackler · 2018-08-04T23:02:50Z

All you need to do to make this code valid is to use mem::uninitialized rather than mem::zeroed:
https://github.com/erlepereira/x11-rs/blob/49aef6df158c9ca38a8450670984f1971686d4ef/x11-dl/src/link.rs#L54

ghost · 2018-08-04T23:02:51Z

Did you not read the above comments? The value is being replaced with a valid value before it's ever accessed. Roughly equivalent example.

let mut x: u8 = mem::uninitialized();
x = 0; // not uninitialized anymore, and therefore not really undefined behavior
do_a_thing(&x);

Edit: The docs say that mem::zeroed is equivalent to mem::uninitialized followed by changing all bytes to zero. If this is not what it does, then the docs are unclear as well as the whole purpose of the function.

sfackler · 2018-08-04T23:04:18Z

That example is totally fine and well defined. This, on the other hand, is still UB:

let mut x: &'static u8 = mem::zeroed();
x = &0;
do_a_thing(&x);

mem::zeroed is not the same thing as mem::uninitialized.

ghost · 2018-08-04T23:10:31Z

let mut x: &'static u8 = mem::zeroed();
x = &0;
do_a_thing(&x);

What implementation detail makes uninitialized any more acceptable than zeroed in this example? Are references anything more than just safe pointers?

ghost · 2018-08-04T23:36:42Z

@kevinmehall pointed out an explanation:

If LLVM is told a type is not going to be zero, and code explicitly writes a zero to it, it can assume that code is unreachable, and compile it into an abort to reduce code size.

So it looks like mem::zeroed() is not at all equivalent to mem::uninitialized() followed by memset(), as the docs suggest. My reasoning for using zeroed was that if there was a logic error in my code, I would rather read from address 0, which raises SIGSEGV on most (not all, but most) systems, which is better than an undefined location, where there could (at worst) be malicious third party code.

Edit: It sounds like this optimization may have been triggered by LLVM if rustc recently switched to a later version. Either the docs should be updated to indicate what mem::zeroed really does, or the function should be changed to behave as the docs indicate.

sfackler · 2018-08-04T23:40:53Z

References are defined to always point to a valid location, but in particular that location is never null. let mut x: &'static u8 = mem::zeroed(); explicitly creates a value that the compiler assumes cannot exist. Similarly, tenum Void {} let x: Void = mem::uninitialized(); is bad - it's creating an instance of a type that has no valid instances.

@rkruppe mentioned on the lang channel of the rust Discord that I may be wrong about uninitialized being valid in this case - can an optimizer select a value of 0 for an undefined value that's tagged as nonzero?

sfackler · 2018-08-04T23:58:38Z

So it looks like mem::zeroed() is not at all equivalent to mem::uninitialized() followed by memset(), as the docs suggest.

Why can't it be equivalent to that? memset is understood by LLVM IIRC and it would realize that the value has been zeroed.

ghost · 2018-08-05T00:40:51Z

In most cases, it would appear on the surface that it is equivalent, but in some cases (such as in x11-dl), it gets optimized down to an instruction that raises SIGILL, which should be mentioned in the docs. This may just be an implementation detail in LLVM that was never intended in rustc, but it was the cause of some breaking changes. If I had used mem::uninitialized instead and subsequently zeroed the value manually, the signal would not have been raised.

sfackler · 2018-08-05T00:58:21Z

If I had used mem::uninitialized instead and subsequently zeroed the value manually, the signal would not have been raised.

Why would it not have been raised? We instruct LLVM to generate an abort inside of unreachable code, and code that invokes UB (like zeroing a non-zeroable value) is assumed to be unreachable.

ghost · 2018-08-05T01:46:24Z

It was because of the implied memset that LLVM thought the code was unreachable. If I were to zero or otherwise change the value in a way that LLVM doesn't recognize as a simple "zero this value," and therefore not trigger the optimization, then the SIGILL wouldn't occur.

As you said above, mem::uninitialized was the correct choice in this case. I erroneously used mem::zeroed as a precaution in case I forgot to initialize any of the struct fields, but no value was intended to be zero by the time it reached the library user.

One of these functions is only required because constructing these enormous structs normally (e.g. MyStruct { field_a: 1, field_b: 2, ... }) results in slow compile times (if it succeeds in compiling at all, which it didn't ealier today), slow execution times, and huge generated code. So I had to use either uninitialized or zeroed and initialize each field in a loop instead.

Before Rust I primarily used C where it's common to memset structs with lots of fields to zero before setting fields individually because accidentally dereferencing a null pointer usually fails more gracefully than an uninitialized pointer. Because this is possible (and often recommended) in C, it seems like it should be permitted in unsafe Rust as well unless the documentation for the functions in question states otherwise.

retep998 · 2019-07-18T17:41:11Z

just pointers to C functions

Were those function pointers wrapped in Option? If not, then the value zero is a trap representation for them.

alercah · 2019-07-18T18:27:30Z

With `uninitialized`, it won't matter if `Option` is used. Function pointers are similar to reference types and to the best of my knowledge subject to the same requirements, in particular that they must be aligned. I don't think it's valid to create an uninitialized one---you'd need to store them as raw pointers (`*const ()` on platforms that support it as there are apparently no raw function pointers?) to avoid these issues. The compiler changing how it compiles code with UB isn't considered a breaking change: there are no guarantees whatsoever about how code with UB will behave. It's possibly a mistake that it's poorly documented that misaligned function pointers are UB (even the current version of the UCG doesn't seem to address it), but that seems to me a docs bug, not a compiler bug, especially given that it is documented that they cannot be null.

…

On Thu., Jul. 18, 2019, 13:42 Peter Atashian, ***@***.***> wrote: just pointers to C functions Were those function pointers wrapped in Option? If not, then the value zero is a trap representation for them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#52898?email_source=notifications&email_token=AE7AOVLBMNWNS4UFYYRA5DDQACTPVA5CNFSM4FM7AGW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JHZNY#issuecomment-512916663>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE7AOVOC2QOBLEOOEX5AEPTQACTPVANCNFSM4FM7AGWQ> .

alercah · 2019-07-18T18:29:33Z

Actually, I just thought of something else: even if misaligned function pointers aren't well documented as UB, replacing `zeroed` with `uninitialized` is never going to be correct---the compiler can happily assume that the value returned from `uninitialized` is going to be all-zeroes, which is enough to trigger UB on its own.

…

On Thu., Jul. 18, 2019, 14:27 Alexis Hunt, ***@***.***> wrote: With `uninitialized`, it won't matter if `Option` is used. Function pointers are similar to reference types and to the best of my knowledge subject to the same requirements, in particular that they must be aligned. I don't think it's valid to create an uninitialized one---you'd need to store them as raw pointers (`*const ()` on platforms that support it as there are apparently no raw function pointers?) to avoid these issues. The compiler changing how it compiles code with UB isn't considered a breaking change: there are no guarantees whatsoever about how code with UB will behave. It's possibly a mistake that it's poorly documented that misaligned function pointers are UB (even the current version of the UCG doesn't seem to address it), but that seems to me a docs bug, not a compiler bug, especially given that it is documented that they cannot be null. On Thu., Jul. 18, 2019, 13:42 Peter Atashian, ***@***.***> wrote: > just pointers to C functions > > Were those function pointers wrapped in Option? If not, then the value > zero is a trap representation for them. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#52898?email_source=notifications&email_token=AE7AOVLBMNWNS4UFYYRA5DDQACTPVA5CNFSM4FM7AGW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JHZNY#issuecomment-512916663>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AE7AOVOC2QOBLEOOEX5AEPTQACTPVANCNFSM4FM7AGWQ> > . >

ghost · 2019-07-18T22:36:25Z

While I agree that mem::zeroed and mem::uninitialized should be avoided, this is a breaking change. Code that used to work fine doesn't work anymore. Code that others are relying on. That's a breaking change. It seems odd that these functions exist in the standard library at all since they're now basically useless. There's no reason they have to crash a program, and they didn't before, but for some reason, somebody decided they now do.

If I had written this library today, I would have known to look for a better solution, but unfortunately I wrote it ~5 years ago or so just after Rust 1.0 was released. What can possibly be done from my end? Yank the library and tell the users they'll have to write a new one? Redesign and rewrite it and force everyone to rewrite their usage anyway? What will happen next time there's an unnecessary change that breaks people's code other than wish I had used a stable language, rather than some experimental new language that gets falsely touted as being reliable?

alercah · 2019-07-18T23:43:42Z

You were doing something you weren't allowed to do. Previously, the compiler didn't notice, but now it does. You're only getting caught now, but your code was incorrect all the way back to Rust 1.0.

That the bug was only encountered now is unfortunate, but compiler changes aren't the only thing that could break this code. It might break if you run it on a different platform. It might break if you compile with different flags, even in an older version of Rust. It might break if you use the code in just the wrong way that nobody has yet, again even in an older version of Rust. The compiler changed in some way which broke the specific examples complained of, but there is not and never was a guarantee that your code was operating correctly. Indeed, it may well only have appeared to be operating correctly but actually had more subtle bugs. It may have been corrupting memory in a way which you didn't notice. It may have led the compiler to, in some cases, optimize away a null pointer check that ought to have been there, causing subtle misbehaviour. Even on versions of the compiler where this "worked", there is no guarantee that it was in fact doing what you thought it was.

You say you wish you had used a "stable language". Would you consider C to be stable? You'd run into the same issues there; nothing to do with language stability. Chris Lattner (who wrote LLVM) wrote a series of blog posts about UB back in 2011 (I linked the second post, because it's the most relevant here). I don't know of any C or C++ compiler that guarantees that code performing undefined behaviour will not change how it behaves: the major ones (clang, GCC, MSVC) certainly do exactly the opposite (except where they have extensions defining behaviour in some cases where the standards would call it UB).

What should you do? You should stop shipping a library that is insecure and buggy. I don't know anything about your code so I can't tell you how to accomplish that. Ideally you can change the types. If they're in the public interface, and you can make appropriate changes without changing your public interface, go do that. Failing all else, make dummy functions that panic when called and initialize to point to them---presumably you don't care about what the actual values were, so using trapping dummies is fine. And you'll get an extra benefit of improved security in case something does try to call these values: trapping rather than setting the instruction pointer to an effectively random address.

Edit: and if for some reason that doesn't work, do the only thing you can do: make a breaking change and tell all your users that they have a security bug and need to adjust. It's the right thing to do---so much so that security bugs are an accepted reason to break backwards compatibility by many major libraries and compilers.

ghost · 2019-07-19T00:51:04Z

x11-dl is just a language binding for a C library that exposes the C functions as a struct of function pointers. These structs have a private lib field, which is just a *mut c_void with a destructor, and public function pointers. Because Rust (at least last time I really used it) can't box values without first passing them through the stack, there were stack overflows at runtime on some systems when loading the libraries, not to mention insane compile times (around 20 min on my old laptop). So as a workaround, the struct was put down uninitialized (previously zeroed), the lib field was carefully set, and the function pointers were initialized in a loop.

Would you consider C to be stable? You'd run into the same issues there; nothing to do with language stability.

Nothing that this library does would be an issue in C or any other systems language that I'm aware of. In fact, I've worked with code many times before in C and C++ that does almost exactly the same thing without issue (like SDL does after it decides whether to load the X11 or Wayland libraries). If Rust is intended to interface with C code, it's doesn't seem right that it's even more unpredictable than C when doing so. Unsafe code should mean "you have the power to shoot yourself in the foot," not "we'll go ahead and shoot your foot for you." All the library is trying to do is initialize a big struct, and the compiler keeps coming up with new ways to make this impossible every 6 months.

alercah · 2019-07-19T01:23:16Z

Because Rust (at least last time I really used it) can't box values without first passing them through the stack, there were stack overflows at runtime on some systems when loading the libraries, not to mention insane compile times (around 20 min on my old laptop). So as a workaround, the struct was put down uninitialized (previously zeroed), the lib field was carefully set, and the function pointers were initialized in a loop.

Ahh. I can see how you arrived at this. There's currently no good language facility for doing this, other than on a field-by-field basis, especially for function pointers. But there are other workarounds which don't rely on UB. For instance, you could add another struct where each field is just a *const (), and initialize the entire thing, and then take a pointer to it and cast it to a pointer to the struct that has the initialized pointer types. Both structs would need to be #[repr(C)], but then this would be guaranteed to work on any platform that guarantees function-object pointer conversions (which include POSIX). And it wouldn't rely on UB, so it would be guaranteed to work without breaking.

Nothing that this library does would be an issue in C or any other systems language that I'm aware of.

Unsafe code should mean "you have the power to shoot yourself in the foot," not "we'll go ahead and shoot your foot for you."

I think you're conflating two things here, namely "what is UB" and "what are the consequences of UB". You're complaining that the compiler has gone and shot your foot, but that's not true. You already shot yourself in the foot. It just took a long time for you to feel the pain. The equivalent situation in C to the one at hand, as far as breaking changes goes, is not having a pointer to function with an uninitialized value. C permits that; Rust does not. An equivalent in C might be signed integer overflow: it's not obviously UB, it often happens to work, and most of the bugs that arise are due to the compiler optimizing around it rather than an explicit trap or obvious misbehaviour. Additionally, it's defined behaviour in Rust (although it depends on debug/release mode), so a Rust program can use signed overflow predictably and reliably, while a C program that has it is inherently buggy, regardless of whether it happens to work sometimes.

RalfJung · 2019-07-19T09:13:35Z

@daggerbot

Unfortunately, if the official response to breaking changes is "lol not a bug", then there doesn't appear to be any possible workaround

What will happen next time there's an unnecessary change that breaks people's code

You are making several unfair and unfounded assumptions here. First of all you seem to think this change was unnecessary. I am not sure why you think that; the change that likely triggered this is #62150 in which we were able to remove a very crazy intrinsic and define it in plain Rust instead. That's a great improvement in terms of maintainability of the compiler and also the language spec (once such a thing exists). Second you seem to think that we knew the change in question would have any kind of fall-out. Again that's not correct, we had do idea. This PR does not change what any of the involved functions actually does: mem::uninitialized still leaves memory uninitialized, and mem::zeroed still returns zeroed memory. However, it leads to slightly different paths being taken in the optimizer, so if code has UB, its behavior can change. This is unavoidable as a part of compiler development.

And finally, it is also not correct that we don't care about when our changes do break libraries, even if those libraries have undefined behavior. But if programmers response to undefined behavior is "lol not a bug", there is no basis for a constructive discussion.

UB is a contract between the programmer and the compiler (and also, somehow, a contract between the programmer and the compiler developers), and if the programmer does not fulfill their side of the contract, the other party will not fulfill their side of the contract either.

So, with that out of the way, maybe we can have a constructive discussion?

What happens here is the equivalent of a program doing an out-of-bounds memory access. With one compiler version that might "work" and just return some garbage value, with the next compiler version it might lead to a SEGFAULT because now the access points to an unmapped page. It is basically impossible for a compiler to guarantee that incorrect programs (i.e., programs having UB) keep "working". Any time anything is changed in the optimizer or the implementation of some method used by unsafe code, that can lead to unsafe code with UB "breaking" (I put this in quotes because the code was, in some sense, already broken -- it had UB). I invite you to look at #62150 and tell me what's "wrong" about it and how we could even have foreseen that this would have any kind of fall-out.

Note that this kind of thing happens in C all the time. Some program has UB, a compiler update breaks it. In a language where programmers can cause UB, it is unfortunately practically impossible to guarantee any stability for programs with UB. We would have to basically stop developing the language as any seemingly harmless change can have pretty much arbitrary effects.

What can possibly be done from my end?

We could try to work together to find out if and where your code has undefined behavior. If it has no undefined behavior, there is definitely a compiler bug here. If it has undefined behavior, we can try to find out if that can be avoided. If it cannot, that's in some sense a "language bug" -- there probably should be a way to do what you want to do without causing UB, but current Rust just does not support that. Then we can work on trying to fix that language bug.

But given that you seem to be willing to use mem::zeroed, I think UB can be avoided. We just have to find where in your library the UB is happening.

With uninitialized, it won't matter if Option is used

That's not entirely correct; Option<fn()> is still way better than fn. It is still UB in principle but much less likely to cause miscompilations in practice.

More importantly though, mem::zeroed is correct for Option<fn()>. So @daggerbot if you have function pointers and can live with mem::zeroed, wrapping them in an Option should work for you. Could you point to where, in your code, the structs you want to zero-initialize are defined?

nikic · 2019-07-19T09:42:19Z

I invite you to look at #62150 and tell me what's "wrong" about it and how we could even have foreseen that this would have any kind of fall-out.

I'm going to take you up on this invitation and say that doing uninit().assume_init() is on a rather different level than just leaving a value uninitialized. It's a thin line to walk, but from the perspective of LLVM initializing something to undef and then overwriting with something valid will (with some caveats) be usually well-defined. On the other hand, taking an undef value and immediately asserting that it has certain properties (as assume_init() will likely do through the use of return attributes if nothing else) will immediately be optimized to unreachable.

It's still UB on the Rust language level in either case, but it seems like this change goes against how these intrinsics have been used in practice for a long time and it might make sense to delay this kind of change until they have been deprecated for a while.

RalfJung · 2019-07-19T09:53:52Z

doing uninit().assume_init() is on a rather different level than just leaving a value uninitialized. It's a thin line to walk, but from the perspective of LLVM initializing something to undef and then overwriting with something valid will (with some caveats) be usually well-defined. On the other hand, taking an undef value and immediately asserting that it has certain properties (as assume_init() will likely do through the use of return attributes if nothing else) will immediately be optimized to unreachable.

assume_init does not assert initializedness towards in LLVM in any way that mem::uninitialized would not already do (namely by having T as return type). So I do not understand in which sense you think this makes a difference for LLVM. The old mem::uninitialized literally took an undef and immediately returned it at some type T, so if that's enough for LLVM to insert a trap, why would anything get worse now?

Are you saying it would help if we, e.g. changed MaybeUninit::into_inner (which is used to implement assume_init) to do a ptr::read instead of accessing the field directly?

RalfJung · 2019-07-20T08:45:03Z

Seems like we are having something of a parallel discussion at amethyst/amethyst#1808 (comment).

RalfJung · 2020-02-29T13:43:29Z

I am going to close this:

SIGILL on zeroed references is not a bug, it is UB and deliberately so.
The UB in the library that triggered this discussion has been fixed.
The rustc PR that causes the SIGILL to be emitted for the old version of the library has been reverted for now (but we plan to revert the revert; there should have been enough time to transition by now).
We now have a lint that statically detects some cases of this UB, and we will soon get a dynamic check as well, which should help developers find issues such as zeroed references.

implement zeroed and uninitialized with MaybeUninit This is the second attempt of doing such a change (first PR: rust-lang#62150). The last change [got reverted](rust-lang#63343) because it [caused](rust-lang#62825) some [issues](rust-lang#52898 (comment)) in [code that incorrectly used these functions](AltF02/x11-rs#99). Since then, the [problematic code has been fixed](AltF02/x11-rs#101), and rustc [gained a lint](rust-lang#63346) that is able to detect many misuses of these functions statically and a [dynamic check that panics](rust-lang#66059) instead of causing UB for some incorrect uses. Fixes rust-lang#62825

Mark-Simulacrum closed this as completed Jul 31, 2018

norru mentioned this issue Aug 1, 2018

Illegal instruction in nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.29.0-nightly (866a71325 2018-07-29) AltF02/x11-rs#90

Closed

Mark-Simulacrum reopened this Aug 4, 2018

bschwind mentioned this issue Jul 19, 2019

#90 Seems to have come back again AltF02/x11-rs#99

Closed

RalfJung mentioned this issue Jul 19, 2019

[BUG] Multiple SIGILL occurences with nightly opt-level=1 amethyst/amethyst#1808

Closed

RalfJung mentioned this issue Jul 20, 2019

re-land #62150 (Implement mem::{zeroed,uninitialized} in terms of MaybeUninit) #62825

Closed

RalfJung closed this as completed Feb 29, 2020

RalfJung mentioned this issue Mar 11, 2020

implement zeroed and uninitialized with MaybeUninit #69922

Merged

RalfJung mentioned this issue Jun 19, 2020

New informational categorization: "unsound" rustsec/advisory-db#300

Closed

5225225 mentioned this issue Jul 3, 2022

Make mem::uninitialized inaccessible in future editions #98862

Open

RalfJung mentioned this issue Jul 11, 2022

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

Closed

dwattttt mentioned this issue Oct 3, 2023

Replace some mem::zeroed uses with idiomatic/safer approaches TimMisiak/dbgrs#6

Merged

Creating a reference with std::mem::zeroed generates a SIGILL #52898

Creating a reference with std::mem::zeroed generates a SIGILL #52898

Comments

koute commented Jul 31, 2018

ishitatsuyuki commented Jul 31, 2018

Mark-Simulacrum commented Jul 31, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

Mark-Simulacrum commented Aug 4, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

Mark-Simulacrum commented Aug 4, 2018

ghost commented Aug 4, 2018

koute commented Aug 4, 2018

Mark-Simulacrum commented Aug 4, 2018

sfackler commented Aug 4, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

sfackler commented Aug 4, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

sfackler commented Aug 4, 2018

norru commented Aug 4, 2018

ghost commented Aug 4, 2018

sfackler commented Aug 4, 2018 • edited Loading

sfackler commented Aug 4, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

sfackler commented Aug 4, 2018

ghost commented Aug 4, 2018

ghost commented Aug 4, 2018 • edited by ghost Loading

sfackler commented Aug 4, 2018

sfackler commented Aug 4, 2018

ghost commented Aug 5, 2018

sfackler commented Aug 5, 2018

ghost commented Aug 5, 2018 • edited by ghost Loading

retep998 commented Jul 18, 2019

alercah commented Jul 18, 2019 via email

alercah commented Jul 18, 2019 via email

ghost commented Jul 18, 2019

alercah commented Jul 18, 2019 • edited Loading

ghost commented Jul 19, 2019

alercah commented Jul 19, 2019

RalfJung commented Jul 19, 2019 • edited Loading

nikic commented Jul 19, 2019

RalfJung commented Jul 19, 2019 • edited Loading

RalfJung commented Jul 20, 2019

RalfJung commented Feb 29, 2020

Creating a reference with `std::mem::zeroed` generates a SIGILL #52898

Creating a reference with `std::mem::zeroed` generates a SIGILL #52898

ghost commented Aug 4, 2018 •

edited by ghost

Loading

ghost commented Aug 4, 2018 •

edited by ghost

Loading

ghost commented Aug 4, 2018 •

edited by ghost

Loading

ghost commented Aug 4, 2018 •

edited by ghost

Loading

sfackler commented Aug 4, 2018 •

edited

Loading

ghost commented Aug 4, 2018 •

edited by ghost

Loading

ghost commented Aug 4, 2018 •

edited by ghost

Loading

ghost commented Aug 5, 2018 •

edited by ghost

Loading

alercah commented Jul 18, 2019 •

edited

Loading

RalfJung commented Jul 19, 2019 •

edited

Loading

RalfJung commented Jul 19, 2019 •

edited

Loading