-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing allocations is technically unsound in OOM (out-of-memory) situations #328
Comments
I've since improved upon the example: fn bad(limit: usize) {
if limit == 0 {
return;
}
let mut x: usize = limit;
let r = &mut x;
bad(limit - 1);
if *r != limit {
panic!();
}
return;
}
fn main() {
bad(usize::MAX);
println!("unreachable");
} The interesting things about it are that 1) it never does anything to inspect the values of the pointers and 2) all of the values in the locals are genuinely live when the recursion bottoms out, so no amount of liveness analysis will get us out of this. |
My original idea for a way to solve this was to not require that addresses of memory in different allocations be unique in general, but to magically guarantee that this is indeed the case for any two allocations that have the property of both being live and having at least one exposed (via let p1 = alloca(8);
let p2 = alloca(8);
if abs_diff(p1.addr(), p2.addr()) < 8 {
// Dead code, but removing it is unsound
p1.expose_addr();
p2.expose_addr();
println!("Unreachable");
}
if abs_diff(p1.addr(), p2.addr()) < 8 {
println!("Unreachable");
}
dealloc(p1);
dealloc(p2); This means that even if we are willing to give up unconditional address uniqueness (which I think we have to given the example above), it is still entirely unclear how to make this work |
I've written up a proposal for a fix here: https://hackmd.io/@2S4Crel_Q9OwC_vamlwXmw/HkQaMnB49 |
At a high level, it seems like that proposal moves the OOM to when So, doesn't this example still have the same problem? let x = malloc(isize::MAX);
let y = malloc(isize::MAX);
let z = malloc(isize::MAX);
if x.expose_addr() != 0 && y.expose_addr() != 0 && z.expose_addr() != 0 {
unreachable!();
}
free(x);
free(y);
free(z); It should be possible to remove this |
(After talking with @JakobDegen on Zulip.) Indeed this doesn't entirely solve the problem of removing dead allocations -- one can still not remove dead allocations that are being exposed. However, that sounds like a big step forwards! Most allocations are not being exposed so removing them can be justified by this model. (And if we keep something like Basically, if we can carry the The downside is that What remains somewhat fuzzy to me is that the address gets picked when the allocation is created, but if it gets exposed then that address may later fail to be disjoint and the program is doomed for OOM. Basically, it seems impossible for a programmer to prove, under this model, that their program does not go OOM. However, these kinds of models usually have to allow any allocation to non-deterministically OOM anyway, so... that doesn't really make things any worse, I think? Certainly some use-cases do need to prove absence of OOM, but they need a different model to begin with. I wonder if there are good ways to relate the two... Now I wonder: what goes wrong if we just remove the "addresses must be disjoint" part entirely, without replacing it by some condition on |
My third example was intended to show that this is potentially problematic: // May these get overlapping addresses?
let p = alloca(8); *p = 0;
let q = alloca(8); *q = 0;
let a = p.expose_addr();
q.expose_addr();
let x = a as *mut usize;
*x = 5;
dbg!((*p, *q)); Legal behaviors for this program that users definitely expect would seem to be for it to OOM, or for it to print |
That's a fair point. Basically this boils down to, the angelic choice that is magically made should only be allowed to affect UB, not observable behavior. |
I think I just realized there is a problem with this scheme: it breaks code like |
I think there are even more problems than that. For example, how does this work with FFI: We don't guarantee that this code extern "C" { fn foo(a: &u8, b: &u8) }
let x = 0;
let y = 0;
foo(&x, &y); will pass distinct pointers to FFI. This was not a concern with other situations, since in those cases the compiler had to assume that FFI code does the worst crimes it possibly could, but here the compiler has no such concerns. This can be fixed by saying something akin to: On FFI calls, we find all the pointers reachable from the arguments/globals and This is ugly though, and I would absolutely love to have a model that does it better. I just don't know one |
I don't see why things would be any different for an FFI function than for a Rust function. |
Oh, yeah, nevermind. This probably doesn't work then, which means we probably have to scrap a lot of this model. Really I think the most important thing we need is some way to address the annoying circular reasoning you can do in Example 2. I don't know how to go about it though. |
I thought about this some more, and can make the issue somewhat more concrete. For slightly weird reasons, I do think this is actually an issue specifically with external functions (such as FFI). Consider fn bar() {
foo(&1);
}
#[inline(never)]
fn foo(a: &i32) {
a.expose_addr();
} A naive implementation of the Rust AM under my model would require us to include a check in
This kind of model almost works, except when you go and make I don't see a way to fix this. As I said above, at this point I'm mostly just trying to think of a brand new idea. |
I think this is why I was saying that it needs to be a compiler obligation that the OOM case of We can therefore lift the exposure into You can therefore optimize |
You could certainly have such a requirement for your compiler, but I don't see how we can in general enforce this requirement on other compilers - if we try and write a rule that says they must obey it, then we essentially once more end up with this being a part of the AM, which has problems as we've seen |
I'm not sure I follow. The model has all code which is linked together cooperating under the rules of the Rust AM, which is perhaps a little self-centered but should be fine. Different compilers communicating across FFI just need to make pessimistic assumptions about the other side - in this case, that FFI calls require all allocations to already be exposed when they are passed in, meaning that the caller should call |
In what sense is this a pessimistic assumption? Of course the compiler has to assume that the other side may emit Now if we instead make a rule that says "you must |
I mean the second thing: "you (the downstream compiler) must If you add the |
I don't see how this works. When you talk about this sort of obligation, who is this obligation to, where is it written down, and in terms of what is it defined? It seems like you're suggesting that this obligation is just in the language itself and is defined in the spec, but the spec has no notion of stage 1 or stage 2. The stages are a useful distinction when talking about the internal reasoning a compiler can do, I don't think we can add a rule in terms of them. |
The AM would just have a rule about overall behavior, and passes right through any FFI calls. The compiler produces a number of compilation artifacts like .rlib files, and these can have additional compiler-specific invariants on them, as long as the final linked executable satisfies the AM rules. In this case, the obligation to expose allocations on stage 2 artifacts and not on stage 1 artifacts is a property of these compiler-specific intermediate files, as part of its implementation of linking pursuant to the rules of the AM. The linking rules are still somewhat under the control of the compiler, since it is only the linked executable that has AM dynamic semantics. So for instance the exact repr(Rust) calling convention does not need to be part of the spec, even though it is important for linking |
I mean, this is how it works right now (more or less), but that's not really enough. For example, this would make it unsound for xLTO to remove dead locals. More generally though, the problem is that it's not possible to implement the AM without imposing additional rules. Right now, in order for two Rust implementations to be able to link to each other, the only things they need to agree on are the AM parameters - layout, calling convention, things like that. But with this model in place, even with all AM parameters agreed upon, the side trying to compile |
I don't see the problem. At the actual assembly level, "all addresses passed across [standard-calling-convention] function call boundaries must be exposed" is naturally true. After all, all objects that still exist at the assembly level must have unique addresses that don't overlap with other simultaneously-live objects. Meanwhile, the LLVM IR level is at least one place where the compiler currently removes unused allocations. (I think it's currently the only place where it does so.) Assuming we want the compiler to continue to be able to perform that optimization, "all addresses passed across calls must be exposed" must be false. So why not posit that, in the current implementation, LLVM IR is stage 1 and assembly is stage 2? Then the 'additional rule' only applies to assembly, where it's automatically satisfied. And LTO is no longer a special case, since that just involves LLVM's optimizer performing the same optimizations across compilation units that it normally performs within a single compilation unit, which are presumptively valid. |
Let me try and present this a different way. Say that I am a Rust compiler looking to emit a static library corresponding to the following Rust crate: pub extern "C" fn foo(a: &i32) {
(a as *const i32).expose_addr();
} Specifically, the goal is for me to emit code that has behavior identical to that of the abstract machine - that is what means to compile Rust after all. But that is not possible. In order to compile this, I must make additional requests on top of the rules associated with the language. |
Such a rust compiler could emit two versions of the function:
Compare this to the following function: pub extern "C" fn bar(a: &i32) {
(a as *const i32).addr();
}
Of course, the compiler can choose to only emit stage 2 artifacts if it wants, or to not include this extra bit of interprocedural analysis data in the stage 1 artifact and always require exposure. In either case the loss is that inlining across codegen units may result in unnecessary exposure of locals. In fact, the stage 1 artifact need not contain an explicit annotation with the exposure analysis. If it is an |
Regarding Ralf's point about PartialEq on slices, I think we already have the necessary API to do the right thing here: |
Where is this rule coming from? If " extern "C" fn bar(a: &i32) -> i32 {
*a
}
fn foo() {
bar(&2);
} In any case, I don't think I'm making my point clear here, so I'll probably stop arguing it. Maybe someone else can explain more elegantly than me. |
(I'm using "calling convention" in a somewhat loose way to also include all the machine preconditions for soundness, not just data layout. I don't think we've miscommunicated on this point but it seems worth calling out.) Let me elaborate on the weakest precondition thing I mentioned in the previous post. The weakest precondition of a piece of code Relating this to the present case, if we have a non-opaque function (i.e. we have code for it), then there are essentially two ways call it: we can actually emit a call instruction, or we can inline the code. The minimum requirements we need to satisfy the AM execution we are after is the weakest precondition of the contained code, while the construction of the function as a callable thing has ensured that the function's calling convention is a stronger than this weakest precondition, so calling the function also satisfies the AM requirements as long as the calling convention is upheld in the call. So if we find a call to an
This is all to say that " |
I am not saying this cannot be fixed, but I think it is too much of a footgun.
Don't we ever remove unused MIR locals during MIR transformation? I would think we do. That is removing an unused allocation.
The systematic way to think about this is that the memory model changes when translating (for example) from LLVM IR to whatever the next IR is (I heard terms like SelectionDAG but I don't actually know the LLVM pipeline). I don't quite see the FFI problem here -- when doing FFI you have to declare on which level you are doing the linking (are you linking two programs in LLVM IR, or in assembly), and the answers to various questions depend on that but it should all be coherent from what I can tell. The most tricky bit is how exactly to formalize the condition that when you call Basically:
NB means it is the compiler's responsibility to prove that this never happens -- this is the exact opposite of UB. When arguing for correctness of the translation to a memory model without overlapping allocations, where usually we can assume that the source program has no UB, here we have to prove that the source program has no NB. But this is easy; in this translation we pick addresses to never overlap (we trigger OOM if they would overlap) so we are good. |
We definitely do, in Wrt everything else, @RalfJung and @digama0 : I'm still not getting my point across properly, since I agree with everything that you said but still believe there is a problem. So let me instead try getting closer to a point where I can explain what is going on by asking a question. Consider, these two snippets: Version Easyextern "C" {
fn foo_a(i32) -> i32;
}
pub fn bar_a() {
foo_a(5);
} Version Hardextern "C" {
fn foo_b(*mut i32);
}
pub fn bar_b() {
let mut x = 5;
let p: *mut i32 = &mut x as *mut _;
foo_b(p);
} There has already been quite a bit of discussion about FFI that asked questions of the form "what are the obligations that I'm instead talking about "what are the obligations that the Rust implementation must uphold wrt these function calls?" This is tricky, because we cannot just do this define this "within the AM" the way we define what How does the language around the answer to the previous question look in the spec? Below I'm going to describe a way that I imagine this could be defined, but you are free (and encouraged!) to completely ignore my description if it makes answering this question easier. To emphasize: I am by no means advocating for what I wrote below. I am only trying to explain the model that I have in my head. How I thought this could workOne way I believe this could be defined is by defining a map of AM states to what I'll call "hardware states." The best way I think is to explain this by giving an example of the language that defines the behavior of the When evaluating a call, if the target of that function call has the type of a function declared in an
This is well and good for 4a. Additionally, for every currently live AM allocation, the contents of that allocation are stored to the hardware memory with the same address. This is enough to guarantee that dereferencing the pointer has the desired behavior. Unfortunately though, it brings back the problem of "removing unused allocations is unsound," because now this is being miscompiled: Snippet Dfn bad(limit: usize) {
if limit == 0 {
println!("Bottomed out"); // Pretend it's just an `extern` function
return;
}
let mut x: usize = limit;
let r = &mut x;
bad(limit - 1);
if *r != limit {
panic!();
}
return;
}
fn main() {
bad(usize::MAX);
println!("unreachable");
} The cause of this is obviously that the phrase "for every currently live AM allocation" is too broad; most of these allocations can't be accessed by the extern function after all. So we could instead use a clause like this: 4b. For every particular byte in AM memory: If there exists an execution of the This kind of clause is what I meant above by
This recovers the soundness of the optimization in Snippet D, because the allocations do not have to be spilled and so we do not have to deal with finite memory. However, now that I've written all this out, this last version actually doesn't sound so bad. It's mostly workable - |
Sorry if this has already been mentioned or doesn't make sense, but what
about the following set of rules?
1. any allocation attempt can report OOM
2. running out of sufficient contiguous, aligned, available address space
is UB
The key idea is that you can never prove that an allocation has to fail or
has to succeed, and therefore they can be added or removed freely if
unused. For instance, the example of two allocations of usize::MAX bytes
would not diverge, but instead exhibit UB.
…On Thu, Apr 14, 2022, 2:05 PM Ralf Jung ***@***.***> wrote:
The following program might be considered to be guaranteed to run out of
memory, but LLVM actually compiles it in a way that it does not:
int *x = malloc(SIZE_MAX);
int *y = malloc(SIZE_MAX);if (x && y) {
printf("This is unreachable. I could deref a NULL pointer and this program would still be fine.");
}
It is not clear how to formally justify this.
@JakobDegen <https://github.com/JakobDegen> also has an example for recursion
exhausting memory via stack allocations
<#206 (comment)>
and being similarly optimized away.
—
Reply to this email directly, view it on GitHub
<#328>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADWYFKIIEL2D4TT72CTWO23VFBM7XANCNFSM5TOPVJWQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Running out of memory is possible and even easy in safe code, so we can't just make it UB. And even if we could, I don't think we'd want to, as it's far too easy to do accidentally. Accidental infinite loops, which were before just logic bugs that were likely to cause OOMs, are now full UB |
This comment is assuming the title of this issue is still an accurate description of the problem, so if things have moved on then ignore me.
Think about how integer overflow is handled: we guarantee that either there is a panic or the result is wrapped, but we don't specify which (at least, not at the language level). In the same way, can't we just say that "running out of memory" will either report OOM or succeed, but not specify which. Of course the compiler is only allowed to make it succeed if all other constraints are still upheld. |
Yeah, this is the version of things that I settled on as well. Allocations don't disappear when they are deallocated, they just get a switch flipped that says they're dead. At allocation time, forced allocations are only guaranteed to be non-overlapping with other live forced allocations. Importantly, p.force_addr();
foo(); // non-diverging can be re-ordered, as The other minor detail then is that in the code for Option 3 above,
Note that this is exactly the same as what happens in a flat memory model. Speaking a little more generally, the intuition that I have for what we're is saying something like this: If the compiler removes the allocation at compile time, then it can pretend that the allocation is non-overlapping with all others - both for the purpose of dodging OOMs and also for the purpose of pointer comparison. This kind of approach should generally agree with people's intuition - "if I don't ask where a pointer is, then it's non-overlapping with other pointers" vaguely makes sense and so functions like |
It is.
If one of the constraints that needs to be upheld is "allocations are non-overlapping" then the stack smashing program is a miscompilation and that's really bad. But if it's not a miscompilation, then we somehow have to explain what it means to have overlapping allocations in the AM and... well, that's what the rest of this thread has been about |
Ah, your OTOH the weirdness will only affect OOB pointers, which is a good sign.
Hm yeah that makes sense.
This now relies on "forced or not" already being fixed when the allocation is created (which it is, thanks to angelic non-determinism; this is basically a prophecy variable), but -- I see, good point. |
Indeed, but this is even necessary to make pointer equality pure. It would be not great if it depended on some mutable external state (whether the |
Something I stumbled across is that allocation planes might also be useful for the global allocation API boundary. It's off topic w.r.t. OOM since removing side effects of calls to The global allocation boundary launders the |
I like thinking of them as "realspace" pointers (which point to real memory) and "hyperspace" pointers. Does this also work for The last thing is that we need some guarantee that prevents hyperspace pointers from ending up as keys in a For ABI calls - I like to think that when a Rust compiler picks a function, it has to nondeterministically pick between doing an "ABI" call or an "AM" call. If the compiler does an "ABI" call, it must ensure that all memory that the function can access has the right values, which probably implies that e.g. all parameters must be realspace pointers and must have a stable non- I see a call as
|
Merging allocations, i.e. converting In the following program, under the proposed semantics for pointers and pointer equality, overflow is impossible. Because let mut counter: usize = 0;
loop {
let a = malloc(1);
let b = malloc(1):
if a.offset(1) != b {
break;
}
counter = counter.checked_add(1).expect("managed to allocate more objects than address space");
} That said, I think this can be fairly easily fixed, and IMO the result is a simpler model. Instead of allocations (and pointers) having a boolean forced flag, they can have an integer allocation domain, where domain 0 corresponds to forced and all other domains are non-forced. Then, pointers are just pub struct AllocDomain(Int);
struct Allocation {
domain: AllocDomain,
// ...
}
impl Memory for FixedMemory {
type Provenance = AllocDomain /* , AllocId, Tag, etc. in Stacked or Tree Borrows */;
fn allocate(&mut self, size: Size, align: Align) -> NdResult<Pointer<AllocDomain>> {
// ...
// Non-deterministically choose whether to OOM
if pick(/* ... */, |is_oom: bool| true)? {
return Err(/* ... */);
}
// Non-deterministically choose a domain for the allocation to live in
let domain = pick(/* ... */, |domain: AllocDomain| true)?;
let addr = pick(/* ... */, |addr: Int| {
// ...
if self.allocations.any(|a| a.live && a.domain == domain && a.overlaps(addr, size)) { return false; }
// If all tests pass, we are good!
true
})?;
// ...
ret(Pointer { addr, provenance: Some(domain) })
}
}
impl Pointer<AllocDomain> {
fn force_addr(self) -> Nondet<()> {
if let Some(domain) = self.provenance {
if domain.0 != 0 {
// No behavior
pick(/* ... */, |_| false)?;
}
}
ret(())
}
} All behaviors of the forced-or-not semantics are allowed in this model (by making every non-forced domain contain exactly one allocation), but not too many are added. In particular:
|
I have not seen this idea of replacing AllocIDs by "domains" containing multiple allocations before -- interesting. I assume you will say that when lowering to assembly, the chosen domain will be 0 for all allocations, justifying that we only compare addresses in assembly? And forcing happens on ptr2int casts? |
Yes, exactly. The previous discussion suggested a few other places that might require forcing in addition to
|
This was the model suggested here originally. We eventually got rid of it in favor of the current model because we felt that this version was simpler. That being said, the example at the start of your message is super interesting. I'll need to think about that some more... |
I think it is not quite the same model? The one that was proposed here involved both an AllocId and a "domain" for this allocation to be placed in. The newly proposed model does away with per-allocation IDs entirely.
However I feel like that cannot entirely work... When doing proofs about optimizations, we often rely on the unique fresh ID of each allocation. Now the ID is fresh only in some executions. We don't want to resolve that non-det early so the argument that nobody else is using my private allocations seems a lot harder to make now.
|
Ah, good to know. I did see comments about multiple domains (that's where I got the name "domains" from), but not knowing how to see previous versions of the HackMD, I assumed it was more in line with the boolean forced-or-not version, special-casing the forced domain and such like. My bad.
Working this out, I'm now convinced that a dense total/linear order is (slightly) better. This matches my intuition that unbounded dense linear orders are particularly simple orders, and the theory of UDLOs is easily decidable, which is convenient for model checking.
Consider the following program. If domains are integers with the normal integer ordering, this cannot loop forever. If domains form a UDLO, however, it can.
let lo = malloc(1);
let hi = malloc(1);
// lo and hi have the same address but different domains
assert!(lo < hi);
assert!(lo.addr() == hi.addr());
loop {
// mid has the same address as lo and hi and a domain in between
let mid = malloc(1);
assert!(lo < mid && mid < hi);
} The optimization that removes an unused allocation is justified by choosing some domain that will never be used otherwise in the program, allowing overlap and requiring rewriting all Unfortunately, on the example above, this optimization introduces the possibility of an infinite loop, meaning it is invalid for integer-ordered domains. If So for the sake of removing unused allocations in the presence of pointer comparison, I think domains should form a UDLO. (This really is a very minor issue, though. Most pointers aren't ordered, and this optimization, even though it is incorrect under integer domains, can only introduce infinite loops where every finite prefix appears in the original program.) |
To be clear, I was still assuming that something like Tree Borrows would be used to declare out-of-bounds accesses and use-after-frees illegal, so there would be some provenance info unique to the allocation, possibly including the per-allocation ID. I was just focusing on the definition of pointer equality and the details needed for removing unused allocations. |
What a wonderful example. Things like this are why I keep coming back to this repo. To be honest, I wasn't really expecting to make pure Ord work in this kind of a model, but I don't immediately see a problem. The trade-off that I am thinking about though is whether it's worth it. This has the potential to surprise users, and my instinct is that pure Ord on pointers is not that important that this is the hill we need to die on. On the other hand, I also can't think of any way a user will discover this except by reading the spec, so might not actually be an issue |
Observationally, programmers already find a lot of details about the pointer comparison semantics surprising, so if we add another gotcha but in doing so make the rules about pointers more overall coherent (and thus "more understandable, even if the first brush violates your intuitions"), I think it's a win. |
@gereeter I don't follow which orders you are talking about -- how exactly are the various orders you are referring to defined?
You said earlier they should be ordered lexicographically (first domain, then address), which would make Do compilers even remove allocations if their addresses are compared with other pointers?
The code doesn't check for 'same address' though? To me it seems like the source of trouble here is that you assumed there was a "lowest" domain, but then also assumed the domains are integers, and obviously there is no smallest integer, so that is just not coherent. I would have expected the optimization to simply not kick in if there are any such comparisons; that means the compiler can choose any domain. |
Yes? https://godbolt.org/z/zfzPhMPfo pub fn uwu(x: *const u8) -> bool {
let y = 1u8;
if x == &y as *const u8 {
true
} else {
false
}
} define noundef zeroext i1 @_ZN7example3uwu17h0b9a63f97a9ab0f7E(ptr nocapture noundef readnone %x) unnamed_addr #0 !dbg !6 {
ret i1 false, !dbg !11
} For completeness, |
With |
Not directly, as far as I know, when ordering is involved. The integer vs. UDLO quibble is an extremely minor side note, and my justification for preferring e.g. rationals was based on a hypothetical optimization. That said, compilers do optimize pointer order comparisons, and can indirectly remove such allocations by first optimizing away the comparison. (For example, in
No, first address, then domain. I think my wording ("comparing domains when addresses are equal") was a bit confusing. To be extra precise, I'm considering two cases, namely pointers as |
Oh, I got the field order wrong, that explains my confusion.
Well, those two are equivalent, so this doesn't tell us a lot about the structure of |
Some relevant discussion: rust-lang/miri#3450 (comment) |
The following program might be considered to be guaranteed to run out of memory, but LLVM actually compiles it in a way that it does not:
It is not clear how to formally justify this.
@JakobDegen also has an example for recursion exhausting memory via stack allocations and being similarly optimized away.
The text was updated successfully, but these errors were encountered: