Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary Scopes/Arenas to formalize lifetimes of local variables #308

Open
arielb1 opened this issue Aug 28, 2016 · 3 comments
Open

Temporary Scopes/Arenas to formalize lifetimes of local variables #308

arielb1 opened this issue Aug 28, 2016 · 3 comments
Labels
S-not-opsem Despite being in this repo, this is not primarily a T-opsem question T-types Things that should be owned by the types team

Comments

@arielb1
Copy link

arielb1 commented Aug 28, 2016

Not sure this is the right place to put this, but @ubsan asked for it:

Semantic divergences from current rustc are in bold. Unresolved questions are (XXX: question).

Arenas

In the code examples here, assume this struct has been declared:

struct NoisyDrop(&'static str);
impl Drop for NoisyDrop {
    fn drop(&mut self) {
        println!("dropping {}", self.0);
    }
}

Arenas are used to ensure orderly and deterministic destruction of local variables and temporaries (XXX: buffer temporaries created for order-of-evaluation).

During its execution, a Rust function manages a stack of arenas, pushing new arenas to it and popping them.

Newly-created locals and temporary locations are always allocated at the end of an arena, but the arena need not be the topmost arena on the stack, as in example (1).

When an arena is popped, the locations within the arena are destroyed in the reverse order of their allocation. However, parts of a location that are in a deinitialized state are not
destroyed, as in example (2).

Lifetime and region checking treats the destruction of an arena as through each location was destroyed separately, in order, but subject to implementation limitations.

The location allocated for a local/temporary (the "alloca") is valid from the moment it is allocated until just after the value inside it is destroyed when it is popped, as through it was an &move reference.

To simplify implementation, each arena in Rust can contain only a sequence of locations
whose type and size are known at compile-time. Of course, this does not imply that an arena
is stored in that order in memory.

NOTE: arenas are sometimes called "destruction scopes", but in rustc destruction scopes do not include binding arenas, so that term would be confusing. (XXX: Bikeshed!)

The arena tree

The arenas in a function are structured as a tree defined by the program's structure:

  • When a function begins, the function's root arena is created and pushed. That arena
    is popped when the function exits.
  • The following expression places push an expression arena when control enters them and pop it when it exits them:
    • The initializer of a let-statement, if any
    • The expression of an expression statement, edit 2017-12-25 with or without a semicolon
    • The body of a match arm
    • The guard of a match arm, if any
    • The RHS of a logical && or || expression
    • The condition of an if or while expression
  • The following block places push a block arena when control enters them and pop it when it leaves them:
    • The body of a function
    • The body of a loop or while loop
    • The arms of an if-expression.
  • Each block, match guard, or match arm pushes a binding arena when it is entered (XXX: do we want after? Does it matter?). The arena is popped along with the statement's containing block's.

Remember that if let, while let and for are defined by their desugaring in terms of loop
and match.

Observe that the tail expression of a block is under the block's binding arena but none of the other arenas

Before a parent arena is popped, all of its children are popped in LIFO order.

Local variables

Local variables, or bindings (XXX: AFAICT the terms are used interchangeably in rustc - do we want to change this?) created by a let-statement are allocated into the statement's containing block's binding arena. The local variables are allocated before the statement's initializer executes.

Local variables created by argument buffers (see example (3)) and argument bindings are allocated in the function's root arena. The order in which they are allocated is that each argument's patterns are allocated followed by the argument's buffer, from left to right.

Local variables created by a match arm are allocated twice - once when the guard is executed, within the guard's binding arena (which will not call a destructor because bindings in guards must either be Copy or ref mut t, but will release the binding's storage) (XXX: pending final status of borrowck in guards) and once within the arm's binding arena (XXX: this differs a bit from the current behavior, but the current behavior is unsound and not something I would like to keep - cc @eddyb @nikomatsakis).

Temporaries

Temporaries that are created by a temporary lexpr borrow-extended by a let-statement are allocated within that let-statement's containing block's binding arena. Other temporaries are allocated into the topmost non-binding arena on the stack when they are created (see Example (1)).

If the pattern of a let-statement contains a by-ref binding, the root lexpr of the let-statement's expression is borrow-extended (see Example (4)).

If an address-of expression is distinguished subexpression of a let-statement, the root lexpr of the address-of expression's subexpression is borrow-extended (see Example (5)).

The following expressions are distinguished subexpression of a let-statement:

  • The expression of that let-statement
  • The subexpression of a address-of, cast, or parentheses that is a distinguished subexpression of that let-statement.
  • The fields (XXX: but not the FRU base) of a dictionary expression, tuple expression, or array expression (XXX: but not repeat expression) that is a distinguished subexpression of that let-statement.
  • The tail of a block expression that is a distinguished subexpression of that let-statement.
    (XXX: this is just documenting the current implementation. Should we do better - see the merged but unimplemented RFC: Better temporary lifetimes (so e.g. .as_slice() works) rfcs#66?).

In some other cases that have yet to be documented (XXX: document them).

No other temporaries are borrow-extended (e.g. type annotations do not matter - see rust-lang/rust#36082).

Example (1) - basic arenas

fn basic() {
    let x = NoisyDrop("local `x`");
    (NoisyDrop("temporary"), ()).1
}

Here there are 4 arenas:

root arena
    - function block arena
        - function block binding arena
            - `let` expression arena 

Here, the local x is allocated in the binding arena, but the (NoisyDrop("temporary"), ()) temporary skips the binding arena and is instead allocated into the function's block arena,
and is destroyed after the NoisyDrop("localx").

Example (2) - conditional drop

fn condi_drop(init_flag: bool, fini_flag: bool) {
    let (init, fini);
    fini = NoisyDrop("fini");
    if init_flag {
        init = NoisyDrop("init");
    }
    if !fini_flag {
        std::mem::forget(fini);
    }
}

Here, the locations of the locals init and fini are allocated from the same arena
in that order. Therefore, when the function is exited, fini is destroyed followed
by init.

At least, that is the case if both are initialized - when both flags are true. If
init_flag is false, then init is never initialized and therefore not destroyed, and
if fini_flag is false, then fini is deinitialized without its destructor being run
by mem::forget, and therefore it is not destroyed.

Example (3) - argument buffers

fn argument_buf((x1,_,z1): (NoisyDrop, NoisyDrop, NoisyDrop),
                (x2,_,z2): (NoisyDrop, NoisyDrop, NoisyDrop)) {
}

fn main() {
    argument_buf((NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1")),
                 (NoisyDrop("x2"), NoisyDrop("y2"), NoisyDrop("z2")),);
}

Here, the first argument's bindings x1 and z1 are allocated first, followed by the first argument's buffer (which contains the triple (NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1"))). Then, the same happens for the second argument.

Afterwards, all but the middle 2 fields of the buffers are moved to the argument bindings, so only the middle NoisyDrop's destructor is called when the buffer is destroyed.

Everything is allocated onto the function's root arena and dropped in reverse order, leading to

dropping y2
dropping z2
dropping x2
dropping y1
dropping z1
dropping x1

being printed

Example (4) - borrow extension by ref pattern

fn main() {
    let ref foo = (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0;
    let _ = NoisyDrop("external");
}

The root lexpr of (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0 is the lexpr (NoisyDrop("extended 1"), NoisyDrop("extended 2")), which is extended to the binding arena and therefore dropped after the unextended let.

This prints

dropping external
dropping extended 1
dropping extended 2

Example (5) - borrow extension by designated expression

fn main() {
    let _x : (&_, NoisyDrop) = (
        {&[&[NoisyDrop("extended 1")]]},
        NoisyDrop((&NoisyDrop("unextended"), "extended 2").1)
    );
    drop(NoisyDrop("external"));
}

This is a rather complicated case. The arena tree is

root arena
    - function block arena
        - function block binding arena
            - let expression arena
            - second statement expression arena

First, storage for _x is allocated in the block binding arena. Next, the first 2 address-of expressions are distinguished subexpressions of the let-statement, so the 2 arrays are allocated in that binding arena too. The third address-of expression is inside a function call, so it is not a distinguished subexpression, and is allocated within the let expression arena.

After the let-statement completes, the let expression arena is dropped, printing "dropping unextended".
Then, the second statement prints "dropping external".

Afterwards, the binding arena is dropped in reverse order of pushing - first the temporary is dropped, printing "dropping extended 1", and the _x is dropped, printing "dropping extended 2".

The overall output is

dropping unextended
dropping external
dropping extended 1
dropping extended 2
@arielb1
Copy link
Author

arielb1 commented Aug 28, 2016

cc @nikomatsakis @pnkfelix

@nikomatsakis
Copy link
Contributor

@arielb1 when you asked me "what should we do about arenas", were you referring to this issue? and if so, I guess you were referring to the various XXX points?

I find this to be a reasonably nice way to formalize temporary scopes. I am not sure where we have latitude to change the current behavior at this point; the behavior of tail expressions can certainly be confusing though. I also don't have a better alternative just now. =)

@arielb1
Copy link
Author

arielb1 commented Sep 18, 2016

Sure.

This is compatible with today's situation except for rust-lang/rust#36082, which I think we can call "just a bug".

@RalfJung RalfJung transferred this issue from rust-lang/rust-memory-model Nov 29, 2021
@RalfJung RalfJung changed the title Temporary Scopes/Arenas Temporary Scopes/Arenas to formalize lifetimes of local variables Nov 29, 2021
@JakobDegen JakobDegen added S-not-opsem Despite being in this repo, this is not primarily a T-opsem question T-types Things that should be owned by the types team labels Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-not-opsem Despite being in this repo, this is not primarily a T-opsem question T-types Things that should be owned by the types team
Projects
None yet
Development

No branches or pull requests

3 participants