Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Compiler optimizations considered harmful in safe function in "unsafe module" #40

Closed
RalfJung opened this issue Jun 6, 2017 · 20 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2017

Consider the following contrived module:

mod strange_unsafe {
    use std::ptr;

    struct StrangeUnsafe {
        magic: i32, // invariant:  This is always 42
        strange: *mut StrangeUnsafe, // this points back to ourselves (we only ever occur in a Box, so that's possible)
    }
    
    impl StrangeUnsafe {
        fn check(s: *mut StrangeUnsafe) {
            assert_eq!(unsafe { (*s).magic }, 42);
        }
    }
    
    pub struct PubStrangeUnsafe(Box<StrangeUnsafe>);
    
    impl PubStrangeUnsafe {
        pub fn new() -> Self {
            let mut s = Box::new(StrangeUnsafe { magic: 42, strange: ptr::null_mut() });
            let addr = &mut *s as *mut _;
            s.strange = addr;
            PubStrangeUnsafe(s)
        }
        
        pub fn do_sth(&mut self) {
            self.0.magic = 13;
            self.0.magic = 42;
            StrangeUnsafe::check(self.0.strange);
        }
    }
}

Playpen

What is strange or troublesome about this? do_sth looks like a perfectly safe function -- it's not marked unsafe and doesn't contain an unsafe block -- so we may think that we can do the usual optimizations that we think we can do in safe functions. However, if the compiler decides to move the store of 42 down below the function call, the module's invariant is violated in a way that the module can observe. That's bad.

One may say the actual cause of the trouble here is the fact that check is not marked as unsafe -- after all, that function doesn't actually work with any odd raw pointer. However, check is internal, and it seems to be a long stretch to rely on everyone annotating their purely internal helper functions properly.

Cc @nikomatsakis @arielb1

@strega-nil
Copy link

@RalfJung that's undefined behavior - you're aliasing mutably with a reference. The subobject is referenced by self, and you write to it - thereby saying "I promise that from this point forward, I am the only pointer to this that shall be read or written to, until I stop being used." Therefore, this is invalid in both unsafe and safe code. You're breaking the invariants of references.

@strega-nil
Copy link

At least, that's my opinion of it.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2017

That's one possible answer, but I feel that could actually rule out real-world unsafe code. Also, specifying exactly when "I stop being used" is happening doesn't seem an easy task. Once you are saying that it matters which of multiple "identical" pointers you are using to access a given location, that opens a whole can of worms.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2017

Also, saying just "from this point forward" would not permit moving loads/stores up across function calls. Seems like an access should have an effect "backwards in time"?

@strega-nil
Copy link

@RalfJung I need to re-write down the wording that I used. Basically, in this case, since neither pointer is derived-from the other, it is known that they do not alias.

@strega-nil
Copy link

and yes, it does open a can of worms - it's very important to open this can of worms. What about this example is different from the (definitely UB) example following?

fn check(y: *mut i32) {
  assert!("", unsafe { *y } == 42);
}

fn foo(x: &mut i32, y: *mut i32) {
  x = 0;
  x = 42;
  check(y);
}

let mut int = 0;
let ptr = &mut int as *mut _;
foo(&mut int, ptr);

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2017

The fact that my example involves an "unsafe type" that has its own invariant attached to it -- an invariant that makes a naive interpretation of my code work fine -- could make a difference.
It just seems really daring to me to rule out such code, because I expect things like that to easily arise when writing a safe abstraction involving raw pointers that do alias with some of the references that are handed to the outside world.

Imagine some form of tree where every node has a raw pointer back to the root. Some public method in this library could take the entire data structure via &mut, and then invoke operations on the tree that do involve the raw pointers. That should be possible, and it shouldn't be too hard to get it right. In such a library, I don't think we want to optimize loads or stores across function calls, that just sounds too dangerous.

@strega-nil
Copy link

@RalfJung it may have to be a special case - pointers that live inside the object you have a reference to.

@nikomatsakis
Copy link
Contributor

(FWIW, I've written code that is very close to this in Rayon -- but not the same. In particular, the code had a "self-link" of this kind, but that self-link is an Arc<T>. This creates an (intentional) reference cycle that gets broken at some point by clearing the self-link, allowing the arc to be collected.)

@nikomatsakis
Copy link
Contributor

This is a good example though that gets right at the question of what the boundary for unsafe ought to be. I really feel like we need a way to make that boundary clearer -- this might be unsafe fields, I'm not sure -- and that it is inevitable that such boundaries will play a role in optimization. I still favor the idea that the boundary for unsafety follows the privacy rules, and that we add some sort of level (unchecked) which means "there may be safety invariants, but all the types have their usual meanings". This would be suitable for e.g. <[T]>::get_unchecked or String::from_utf8_unchecked.

@strega-nil
Copy link

@nikomatsakis I still disagree that unsafe should make anything special - all code should have the same invariants, and, imo, unsafe should only be used as a lint. Otherwise, I think people will be confused that their code works in unsafe, but not in safe.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2017

But it won't even compile in safe -- the compiler will complain that you are performing unsafe operations.

@strega-nil
Copy link

@RalfJung I'm talking about stuff that's in your example above.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2017

So am I. If we say that unsafe makes the entire surrounding module special, then you won't have that problem where something silently stops working if you make it safe. It will instead stop compiling.

@strega-nil
Copy link

@RalfJung that both seems like a massive pessimization, and only moves the problem. Now it's "if someone creates an implementation module".

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2017

It's not "just" moving the problem -- being in a separate module makes the privacy rules kick in.

@strega-nil
Copy link

strega-nil commented Jun 9, 2017

@RalfJung that doesn't really matter for internal modules? You can have intermodule invariants.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2017

Hm, yeah, you can. Maybe we require some annotation if that's what you do? Not sure. I don't claim I have all the answers. ;)

Do you have an example of a real crate doing something like this?

@strega-nil
Copy link

@RalfJung no, but I don't write very much unsafe code. Maybe something like rayon? None of this seems like super unreasonable code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants