Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized generative constructor initializer code. #3002

Open
lrhn opened this issue Apr 17, 2023 · 9 comments
Open

Generalized generative constructor initializer code. #3002

lrhn opened this issue Apr 17, 2023 · 9 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@lrhn
Copy link
Member

lrhn commented Apr 17, 2023

This proposes a generalization of object initialization, which allows more powerful and expressive computations during initialization, while still maintaining a separation between code running before an object has been fully initialized (no access to this) and after (the current constructor body).

Motivation

The current syntax for initializing instance variables in non-redirecting generative constructors, the "initializer list", is very restricted in what it can express.
The only allowance is assigning the result of a single expression to one instance variable. If two fields need to share a value in any way, say one containing a stream-controller, and another a value depending on the stream of that controller, it cannot be expressed in a single constructor.

The immediate workaround is to use a factory constructor which does all computation, and the calls a private generative constructor which just initializes fields with pre-computed values.
If a public generative constructor is needed, another workaround is to use a forwarding generative constructor which creates the shared object, and pass it to another constructor, which can then refer to it through the parameter, like a kind of "let" constructor using constructor chaining.

Proposal

Allow initialization of instance fields to happen inside the constructor body, as well as in an initializer list.
To do that, the super-constructor invocation is allowed to be moved into the constructor body as well.

The grammar is changed such that a non-redirecting generative constructor with:

  • No initializer list, or
  • an initializer list with no super constructor invocation

can be followed by a constructor body block which can contain at most one super constructor invocation as a top-level "statement" in the block.

If such a constructor contains zero super constructor invocations, one is inserted automatically at the latest possible place where it would be allowed in the body block.

A `super constructor invocation statement has the form super(args); or super.name(args);. That is, the same syntax as the entry in the initializer list, followed by a semicolon.

All existing syntax remains valid. A constructor with no body block, just a ;, will still get its super constructor invocation appended to the initializer list. A constructor with a super constructor invocation in the initializer list will work exactly like today.

The behavior of such a constructor is that:

  • All code in the body block prior to the super constructor invocation is initialization code.
  • Initialization code can be statements, but access to this is restricted as follows:
    • Only this.x is allowed, where x is an instance variable of the current class.
      Any other use of this is a compile-time error.
    • No other access to the instance or instance members otherwise. That includes through super.foo() invocations.
    • Unqualified variables, x, resolving to instance variables are allowed, as equivalent to this.x. Those will be
      shadowed by parameters or other local variables as normal (unlike initializer lists which allow x = x instead
      of this.x = x).
    • Every instance variable of the current class is temporarily given a "definitely/possibly assigned" property,
      similar to local variables.
    • Every variable already initialized by an initializer expression, initializing formal or initializer list entry is definitely assigned.
    • Every other instance variable is definitely unassigned.
    • While executing the initialization code, such variables may become definitely assigned if they're, well,
      definitely assigned to, using the same rules as for definite assignment of local variables.
    • Non-final, definitely uninitialized final, and potentially unassigned late final variables may be assigned to.
    • Definitely assigned, or potentially assigned and late, variables may be read. (This is new!)
    • At the super constructor invocation, all non-nullable instance variables must be definitely assigned.
  • The super constructor invocation then chains to the superclass object initialization as normal.
    • When it returns, the object is fully initialized and the following code may reference this freely.
    • All instance variables revert to being just instance variables, with no "potentially/definitely assigned" properties.
      Because they're all either definitely assigned or late.
      (Obviously an implementation can remember information for optimizations,
      like knowing that a late variable is definitely assigned.)
    • Local variables defined prior to the super constructor invocation are still in scope and can be accessed.

If neither the initializer list, nor the constructor body block, contains a super constructor invocation,
an invocation of super() is inserted as late as possible.

If there is no constructor body block, it's inserted at the end of the initializer list as normal.
If there is a constructor body block, it's inserted at the latest possible point in that block, which means just before the first statement of the block which references this or super in a way that is not allowed in the initialization code. If there is no such statement, the super() constructor invocation is inserted at the end of the constructor body block.

That is, the only change in behavior occurs when the constructor body block contains a super constructor invocation, which is entirely new syntax, or the constructor does not contain any super constructor invocation at all. In the latter case, the super constructor invocation may be moved to later in the body, so some local computation may now happen before the super-constructor invocation, but it's only about computation ordering, the computations should not affect each other, unless they do so through global state.

A const constructor must still not have a body, which restricts them to the existing initializer list and no statement control flow.

Consequences

With this change, you never need more than one constructor to construct an object.
You can still have multiple constructors, doing different things, but you never need to add an extra private constructor just to do more complicated computation before initialization.

Closures

I conspicuously avoided mentioning closures.
If one creates a closure in the initialization code of a constructor body block, which references an instance variable, and then calls the closures after the super constructor invocation, what happens?

Preferably it should just work as if the closure had always referenced the same instance variable. But it's not unreasonable that the object doesn't yet exist during initialization, and the this.x variables are really place-holder local variables that are being initialized, and only stored into an object later, when it's been allocated.

Maybe that issue solves itself, if creating a closure containing an instance variable will always treat the variable as potentially, but not definitely, unassigned, so any attempt to read it will fail. An attempt to write might be valid, though, if the variable isn't final.

The most direct solution is to say that it "just works", but that may force a specific implementation approach onto back-ends, where the object is always allocated first, and variables during initialization are backed by the object instance's memory slots.

In most cases, it just won't matter, because capturing instance variables during initialization is incredibly rare, and reusing the closure afterwards is even rarer. And if the compiler can optimize the remaining constructors, a few de-optimized cases won't be a problem.
(But maybe accidentally capturing becomes a bigger issue if we start allowing more kinds of code, like doing someList.any((x) => x.name == inputName) where inputName is an already initialized non-final instance variable.
We can't directly see that this closure won't escape to be called after object initialization, so we may need treat the inputName field less efficiently during instantiation. But if inputName is final, we can choose to just close over the value, not the variable, which must be definitely assigned already for the code to even be valid.)

I'd suggest that when accessing instance fields during initialization is allowed, we should also allow closing over them, and then take whatever hit it costs us if someone does that.

An alternative is to not allow reading initialized variables, only allow writing to them. It's slightly less ergonomic, but it's what we do today, and it isn't too bad.

Variants and extensions

Don't allow reading initialized instance fields

Instead of allowing you to read this.x in initializer code if it's already definitely assigned, we just don't allow that.
The only valid use of this.x is to assign to it.
That also means that capturing an instance variable is less likely to happen. You have to capture a write, this.x = v, which only makes sense if the variable is non-final or late (because otherwise the closure itself forces this.x to be potentially assigned, in case the closure is called more than once).

It's still possible to refer to local variables with the same value, it just requires changing:

Foo(args) : _controller = StreamController<T>() {
   _stream = _controller.stream;
}

to

Foo(args) {
   var controller = StreamController<T>();
   _controller = controller;
   _stream = _controller.stream;
}

Which isn't bad.
(Or go all-in on brevity and do:

Foo(args) {
   _stream = (_controller = StreamController<T>()).stream;;
}

)

"Factory" generative constructors.

Sometimes you don't want to expose a public generative constructor, because you don't want people to subclass your class through that constructor.
With Dart 3, you can make the class final or interface to prevent that entirely, but if you want to push subclassing to use a specific constructor, and still expose another constructor for creating instances, you'd have to make the other constructor a factory constructor.

We could allow you to write factory on a generative constructor:

Foo(args) factory : initializerList {body}
Foo(args) factory {body}
Foo(args) factory : initializerList;
Foo(args) factory;

The factory modifier is put after the constructor parameters, because putting it in front will make the second line above conflict with the existing factory syntax of factory Foo(args) { body-returning-value }.

The effect would be that this particular constructor cannot be used as a super-constructor by subclasses (maybe only "outside of the same library", like other access modifiers).

Initializer list blocks

Rather than, or in addition to, moving initialization into the constructor body, we could allow code blocks inside the initializer list.

Foo(args) : this.z = z, { 
   initializer code
}, this.w = w { 
   body code
}

Each initializer list block will be treated the same as the initializer code inside the body proper. It can initialize instance variables, and access ones already initialized earlier in the initializer list.
At the end of it, some instance variables will have been definitely or potentially assigned, and that carries forward to the rest of the initializer list, and the body initializer code, if any.

Local variables in initializer list blocks are not visible in later initializers.
Unless we want them to be.

The syntax is a little hard to read, e.g., Foo(args): {initblock}, {initblock} {bodyblock}. The separation between initializer block and body block is hard to read. This readability issue is the primary reason why the proposal doesn't try to split initialization code into its own block.

@lrhn lrhn added the feature Proposed language feature that solves one or more problems label Apr 17, 2023
@munificent
Copy link
Member

I really like this proposal. I have heard from many users over the years that the constructor initializer syntax is one of the most unintuitive parts of the language. It only really makes sense if you have C++ experience and the set of people who do is not exactly growing these days.

Top-level super calls

can be followed by a constructor body block which can contain at most one super constructor invocation as a top-level "statement" in the block.

I think requiring this to be at the top level would be annoyingly restrictive. I can see users wanting to write:

SomeClass(bool b) {
  if (b) {
    super('some', 'stuff');
  } else {
    super('different', 'things');
  }
}

Given that we're already basing the proposal around definite assignment analysis, I think a natural way to model this is to consider the super constructor call as "initializing this". It's like this is a final variable that gets initialized by calling the superclass constructor. At the beginning of the constructor, this is definitely unassigned. A super constructor call definitely assigns this. It's a compile error to:

  • Call a superclass constructor when this is already definitely or potentially assigned. So you can't call super() more than once along a code path.
  • Exit the body of the constructor without this being definitely assigned. So every code path has to initialize it.
  • Access this or any instance member at a point where this isn't definitely assigned.

Closures

Closures are nasty, as always. I'd be inclined to just say that you can't close over an instance member at all until after the superclass constructor call.

@eernstg
Copy link
Member

eernstg commented Apr 18, 2023

I understand the desire to have a more expressive language available for the initialization phase of construction, but I'm worried about the non-homogeneous semantics. In particular, any pre-super access to an instance variable declared in the current class would be similar to an access to a local variable, but every other access to that instance variable (at any location with access to this in that class body, not just in constructors after super) would be a getter or setter invocation, hence possibly running arbitrary code in an overriding declaration.

For example:

class A {
  final int i;
  A() {
    i = 0;
    print(i); // Prints '0'.
    super();
    print(i); // Same. No wait, if `this is B` then it throws!
  }
}

class B extends A {
  int get i => throw "Not the same as reading the instance variable like a local variable";
}

The fact that super() can be inserted implicitly somewhere in the body of the constructor makes it even harder to reason about the code.

In short, I don't think this kind of semantics is particularly readable, maintainable, debuggable, etc.

We could consider a different way to get a similar level of expressive power, but maintaining the current semantics:

class A {
  final int i, j; // Declare two instance variables to show that it works with more than one.

  // Use a pattern assignment in the initializer list to set all instance variables in one step
  // (assumes that https://github.com/dart-lang/language/issues/2774 has been accepted).
  // Use a function literal returning a record to provide all the values in one step.
  // Then call `super()` as usual at the end of the initializer list.
  A(int arg): (i, j) = ((){
    ... // No access to `this`, but otherwise all of Dart.
    return (arg, arg + 1); // Return tuple for pattern assignment to `(i, j)`.
  }()), super() {
    ... // Normal constructor body code. No special rules.
  }
}

I'm not suggesting that anyone would be really happy about writing code in this style (even though it could actually be used exactly as shown if we assume #2774), but it could serve as a starting point for a mechanism whose semantics is as shown in the code above, and whose syntax is a non-redundant and readable abbreviation thereof.

For example, we could simply consider using a block expression:

class A {
  final int i, j;

  A(int arg): (i, j) = {
    ... // Initialization code.
    return (arg, arg + 1);
  }, super() {
    ... // Normal constructor body code.
  }
}

A block expression is similar to a function literal with no formal parameters, but it is always executed (we just run the code when it is reached, there is never a function object). We might well want block expressions anyway, so why not use them here, together with pattern assignment.

Return expressions in the block expression must fit the context (in this case: they must return an (int, int) record such that it is assignable to (i, j)), and it must be guaranteed that the block doesn't complete normally (that is: it must return, it can't reach the end of the block), but otherwise there is nothing special about the code in that block expression.

Of course, we can mix and match these approaches: If we need to declare local variables and perform arbitrary computations in order to initialize some instance variables (i, j) then we can use a block expression for that, but if we have several other instance variables or assertions that we want to put in the initializer list then that's also possible, of course, as in a = e1, b = e2, assert(c), (i, j) = { ... }, super(...) {}.

@jakemac53
Copy link
Contributor

jakemac53 commented Apr 18, 2023

I am not a fan of the injection of the implicit super() call at places in the middle of the constructor, especially considering the comment from @eernstg . I would just require it to be explicit somewhere.

I think if you combine the explicit super with the mental model of definite assignment for this, then the scope changing after that super call is acceptable, but I am not sure how you would write to a field if you can't access this?

@lrhn
Copy link
Member Author

lrhn commented Apr 19, 2023

The difference is that a this.x = something; assignment is not really accessing this, it's just accessing the uninitialized x field on what will later become this. It's just a special syntax for initializing a pre-this memory slot.

But it is a point where the syntax gets confusing, and why I am more worried about Erik's point that this.x before super() and this.x after super() are completely different things. One is accessing the underlying memory cell directly, the other is a virtual invocation that can do anything.
I'd solve it by disallowing this.x reads before super(). Use a local variable if you need to access the value, rather than first writing to this.x and then reading this.x back.
It would be occasionally useful, but if we have initialization code, we have local variables, which is at least as powerful.

So: Before super(), the only allowed use of this is in assignments to instance fields of the same class, this.x = ....
And you can't close over those, because we need to know whether the field is initialized or not. (Unless the field is nullable and non-final, then we don't need to know, and then abstracting over assignment might just be fine.)

And Bob's idea of considering this as equivalent to an initially unassigned final local variable, which gets definitely assigned by the super() call, makes sense.
I'm not sure it gives that much flexibility, because if this ever becomes potentially assigned/unassigned, you've likely lost any chance of ever recovering from that. The calls must happen in parallel branches.
But it does allow choosing between two different super-class constructors without having to have two constructors yourself, which is potentially nice (say a subclass of DateTime which calls DateTime or DateTime.utc depending on a parameter).

That also makes me less worried about implicitly inserting the super() call, because there isn't that much difference between before and after, as long as you don't access this after. You can always insert the super() yourself if you want it to happen at a particular time. Otherwise it happens at the end of the constructor body, or if that would be invalid, just before the first statement containing a use of this which is not initializing.

It's true that a statement-expression/block-expression with a record result, and pattern assignments in initializer lists, gives almost the same behavior. It doesn't allow local variables surviving across the super() call, but other than that, you can do any statement based computation, and end with all the values needed to initialize some fields.
It's a nice feature, I'd want it for other things, but I still think making the constructor initialization more approachable can be a goal of its own, and adding more complexity into the initializer list, instead of moving things out of it, isn't necessarily good for that goal.

@jakemac53
Copy link
Contributor

That also makes me less worried about implicitly inserting the super() call, because there isn't that much difference between before and after, as long as you don't access this after.

This still feels way too magic to me, and it provides very little value. Requiring it to be explicit makes the code more readable/understandable. For instance, how does the debugging story work here? All the sudden you just get launched into the super constructor? It means you can get exceptions in between synchronous lines of user code etc (and in general, arbitrary code can run between user visible statements). All of that is pretty horrible IMO.

@lrhn
Copy link
Member Author

lrhn commented Apr 19, 2023

We could say that if there is no super invocation, a super() is inserted at the end of the body. If you want/need it anywhere else, you need to write it.

But that changes behavioir relative to the current behavior, where the super() is inserted into the initializer list.

Or we could insert it there, and if you want the constructor body to do initialization, you have to write a super call in the body. But that's annoying if it's just calling super(), like a class extending Object, exactly where we currently allow you to not write anything.

@jakemac53
Copy link
Contributor

I would feel a lot more comfortable with it only being injected at the end.

@rakudrama
Copy link
Member

Is there something fundamentally new here, or is it 'just syntax'? Can generalized generative constructor be implemented by rewrite to a combination of existing constructors? I think closure scope possibly can't.
Multiple super-calls might also be difficult as a redirecting generative constructor redirects to a single target, but multiple super-calls could target different super-constructors.

One thing I have wanted but is not addressed here is the initialization of cycles, both self-cycles (this._head = this) and mutual cycles. One can use late but it is hard to optimize the check away.

@lrhn
Copy link
Member Author

lrhn commented Apr 20, 2023

Most likely, now that we have records, everything can be rewritten into something we can do today.
Say:

class Foo {
  final int v1;
  final String v2;
  Foo(args) : {
    // initBlock assigns to this.v1, this.v2
    super(super1); 
    // post-`super`-constructor-block
  }

could become

class Foo { 
  final int v1;
  final String v2;

  Foo(args) : this._(_computeValues(args));

  Foo._(({int v1, String v2, bool $super1, $init: void Function(Foo)}) values) : this.v1 = values.v1, this.v2 = values.v2,
    super(values.$super1) {
    fieldValues.$init(this);
  }

  static ({int v1, String v2}) _computeValues(args) {
    final int $this_v1;
    final String $this_v2;
    bool $super1;
     // initialization block, with `this.v1 = ...` replaced by `$this_v1 = ...` and `super(value)` with `$super1 = value`.
    return (v1: $this_v1, v2: $this_v2, super1: super1, $init: (Foo self) {
      // post-`super()`-constructor-code, with `self` instead of `this`, and using `self.v1` to access fields.
    });
  }
  ///...
}

Where it gets a little tricky is Bob's idea to allow choosing dynamically between super-constructors.
That's not possible today, the generative constructor call sequence cannot be changed based on values.
It requires a factory constructor.

Another possible issue is local variables accessed across the super invocation.
Because the "post-super()-constructor-code" is moved inside a function, captured variables may change whether they can be promoted, and may lose types of interest and promotion chains across that call.

That problem is in the semantic details of the rewrite, not something we definitely can't find a rewrite that solves. We just have to be very careful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

5 participants