Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should there be an expression syntax for accessing positional record fields? #2388

Closed
leafpetersen opened this issue Aug 5, 2022 · 23 comments · Fixed by #2422
Closed

Should there be an expression syntax for accessing positional record fields? #2388

leafpetersen opened this issue Aug 5, 2022 · 23 comments · Fixed by #2422
Assignees
Labels
records Issues related to records.

Comments

@leafpetersen
Copy link
Member

leafpetersen commented Aug 5, 2022

The current record proposal provides no notation for accessing a single positional field:

Positional fields are not exposed as getters. Record patterns in pattern matching can be used to access a record's positional fields.

There are a couple of implications of this.

First, I think this means that there's significant asymmetry between named and positional fields, in that you can access a named field directly on the object, but you can't do so for a positional field. So you have a constant sized syntax for reading a single named field, but the only syntax for reading a positional field requires reading all of the fields (out into a pattern match) which is fairly verbose (even if you just use _ for all of the fields you don't care about).

Second, there's a semantic asymmetry in that you can presumably access named fields dynamically (and hence write code that is polymorphic over records with named fields) but not positional fields:

void printX(Record record) {
  print(record as dynamic.x);
}

Should we provide a way to read a single positional field? e.g. record.0 etc?

cc @munificent @lrhn @eernstg @stereotype441 @natebosch @jakemac53

@leafpetersen leafpetersen added the patterns Issues related to pattern matching. label Aug 5, 2022
@munificent
Copy link
Member

I would like to come up with a positional field accessor expression syntax, yes. I don't have a design yet and I don't think it's essential so the current proposal just says there isn't one.

The proposal initially said each one got a named getter like field0, field1, etc. But that runs into annoying/dumb problems around what if you try to have a named field with that same name? So to keep things simpler and avoid coming up with some solution for those collisions, I just took positional field getters out.

@stereotype441
Copy link
Member

Crazy random idea: what if the syntax for positional field accessors is simply the [] operator applied to an integer literal?

E.g.:

(int, String) a = (3, 'foo');
var b = x[0]; // b has static type `int`
var c = x[1]; // c has static type `String`
var d = x[2]; // Static error: no operator [2] in (int, String)
var e = x[0 + 1]; // Static error: no operator[] in (int, String)

The CFE could desugar accesses like x[0] into property gets using a name that would otherwise be invalid (e.g. %field0), so there's no possibility of conflict with existing fields.

Probably a terrible idea, but just thought I'd throw it out there :)

@rakudrama
Copy link
Member

@stereotype441 I think the positional access syntax should work when the receiver has static type dynamic to exactly the same degree as it will for named fields.

If we decide that we must implement toString to show the names of the fields then we have enough metadata to implement a read-only asMap() view of the record, so we could implement a general indexer, and use your suggestion of special static typing rules for a literal or constant index value when the receiver is a known record type.

However, I would rather not have that metadata in the compiled program.
Unlike function types, the names are not needed for type checks (the subtype needing a superset of the named parameters).

@eernstg
Copy link
Member

eernstg commented Aug 8, 2022

I tend to prefer @munificent's first proposal (using names like field0 .. fieldN for the positional components, although we might of course use a different specific name than field...). There could be name clashes, but that's not a breaking change because there are no records now. I'm sure we can think of a naming scheme which is reasonably readable, and unlikely to clash with names that developers actually want to use with named components.

This allows the mechanism to be consistent and convenient:

With respect to parsing, and comparing with the alternative myRecord.0: We wouldn't need to handle special member names like 0, 1, ... in the parser. For instance, could 1.5 be an attempt to look up a positional record component in 1? What if we introduce implicit constructor invocations that allow us to turn 1 into a record?

Comparing with myRecord[0]: In a dynamic invocation myDynamic[someExpression], would we enforce the constraint that someExpression must be an integer literal, or at least a constant expression? If we do support iterating over all positional components of a record using try ... (myRecord as dynamic)[i++] ... catch, shouldn't we also support iterating over all named components? Why not all named members of instances of classes? ;-)

With respect to the practical value of accessing positional components using normal getters: Developers can use r.field2 in the middle of an expression. It might be quite inconvenient to have to introduce a pattern matching construct at that point.

Dynamic invocations: It seems likely that we can support dynamic invocations, even if the runtime uses a more compact representation for positional components and their names than they do for named components.

There would be other corner cases, for example: It would probably not be possible to introduce a non-trivial noSuchMethod of a record type, but the one in Object would at least have a meaningful memberName to print (comparing again: #0 is not a symbol).

@lrhn
Copy link
Member

lrhn commented Aug 8, 2022

I've suggested record[0] before (can't find where). It works, and the main issue is that it looks like the index operator, but is actually a special record syntax which requires a constant operand (doesn't have to be a literal, any constant will work).

Last I was discussing this, I gravitated towards liking .0 better. It's not syntax which otherwise exists (except as part of a double literal, 1.0) and the lexer handles that ambiguity already - 1.0 is a double literal, if you don't want it to be a single token, you need parentheses (or maybe just spaces, that could get ugly?).

So

var r = (42, 37, foo: 87);
print("(${r.0}, ${r.1}, ${r.foo})");

would work.

We can make it work for dynamic invocations too, it'll just fail if the target is not a record with that many positional elements.
Since the grammar is specific to record member access, we won't be introducing a way to do dynamic record lookup by going through dynamic, like [0] would: Record r = ...; var nth = (r as dynamic)[n];.

That too is a reason for me to not allow [0]. I do not want runtime-introspection. If a compiler recognizes that nobody every uses the .1 field of a (int, int, int, foo: int) record, it should be allowed to optimize it away. Even doing (o as dynamic).1 will probably void that optimizaton. Doing (o as dynamic)[n] is much more likely to happen, e.g. when people are parsing JSON).
That's also another reason I don't want toString to be clever, and would prefer it just returning Instance of Record. Having it include all fields means not being able to optimize fields away!

I wouldn't make (o as dynamic).0 hit noSuchMethod on o, since .0 is not an object member at all. (But then, I'd be fine with not hitting noSuchMethod for any member which isn't part of the interface of o to begin with.)

(We can even, in some hypothetical future, choose to allow classes to declare positional members, named 0, 1, etc., if we want to. They must be consecutive and start at 0. I don't have a syntax ... yet!)

@leafpetersen
Copy link
Member Author

I'm inclined to agree with the analysis from @lrhn above.

@Levi-Lesches
Copy link

I tend to prefer @munificent's first proposal (using names like field0 .. fieldN for the positional components... I'm sure we can think of a naming scheme which is reasonably readable, and unlikely to clash with names that developers actually want to use with named components.

I think reserving names like field0 is a good thing because if someone starts writing named fields that are essentially just "1, 2, 3...", then those fields don't need to be named -- they might as well be positional. Users would (probably) get a warning saying their field names are conflicting with the positional identifiers and they'd be able to think about whether they really need named fields. Then if they really want, they can change to something else like myFirstField.

Then you get all the benefits of myRecord.0 without the unusual grammar. Even the dynamic lookup is no different than doing (o as dynamic).namedField.

@munificent
Copy link
Member

We discussed this in the language meeting today. We agree that some expression syntax for accessing positional fields is important. (In particular, I find Leaf's point that without an expression syntax, accessing the nth field requires a pattern of at least n subpatterns, which can be very verbose to be compelling.)

We haven't settled on a syntax. A few options we're considering (most already mentioned here):

record.0, record.1, etc.

This would be a new syntax. We'd treat each of these like separate operators and not a single "positional field" operator that takes an index as an argument since we need separate return types for each field. Lexically, we'd treat . and the integer as separate tokens, but the parser would treat them as a single conceptual unit.

The . and a receiver before it are both required. Inside an extension on a record type, you could not simply use 0 as an implicit self send to access the zero-th field!

Pros:

  • It's extremely terse, basically as short as you can get.
  • It can't collide with any named field names.

Cons:

  • It's new syntax, which is always fairly costly in terms of complexity and implementation effort.
  • It makes it harder to ever support any future syntax that allows multiple adjacent expressions, since that would now become ambiguous with an identifier expression followed by a double literal. (Adjacent expressions are already hard to support because -, [, and ( all have both prefix and infix expression forms.)

record[0], record[1], etc.

In other words, reuse the existing subscript operator syntax. But, in order to handle the heterogeneous types of the fields, we require the index to be an integer value known at compile time.

Pros:

  • No new syntax.
  • Can allow constant expressions to refer to field indexes in addition to integer literals.

Cons:

  • Potentially confusing to users that the index expression must be a constant expression.

record.field0, record.field1, etc.

Just come up with some prefix like field.

Pros:

  • No new syntax or static semantics. It's just auto-generated getters.

Cons:

  • field is pretty verbose.
  • Have to deal with collisions with named fields using the same name. For records themselves, this isn't really a problem—just don't do that. But if/when we want to be able to spread records to argument lists, we may encounter parameter lists that have named parameters that do collide with these.

record.$0, record.$1, etc.

Like the previous suggestion but using $ as the prefix, which is already a valid Dart identifier.

Pros:

  • No new syntax or static semantics. It's just auto-generated getters.
  • Shorter than field.

Cons:

  • Could still technically collide, though the odds of their being named parameters named $0, etc. is quite slim.
  • Looks weird in string interpolations. Though you would almost always be using the braced form of interpolation anyway, since the record you're accessing the field on is unlikely to be this. "some string ${record.$0}" isn't that hard to read.

Still an open discussion.

@munificent
Copy link
Member

A topic we haven't discussed yet that I think could affect this decision is code that is polymorphic over tuple arity. Right now, there's no plan to be able to write code that works with a record type and is generic over how many positional fields the record has.

I suspect that kind of use case will come up. For example, one of the approaches to handle awaiting records (#2321) is defining a set of core library functions like:

Future<(T1, T2)> wait2<T1, T2>(
    Future<T1> future1, Future<T2> future2) {
  ...
}

Future<(T1, T2, T3)> wait3<T1, T2, T3>(
    Future<T1> future1, Future<T2> future2, Future<T3> future3) {
  ...
}

Having to define separate functions for each arity up to some arbitrary maximum is pretty tedious. C++ introduced a notion of parameter pack to allow templates to write code that can be more flexibly generic over this kind of boilerplate.

We could probably tackle this in Dart just using macros. But if we more graceful support for this kind of code, we might want to support variadic generics and a way to build records and record types out of the corresponding type parameter lists.

If we do that, then the code working with those generic records may need to access positional fields in an abstracted way. I think the record[n] syntax could handle that fairly gracefully, but a syntax that bakes integer literals into identifiers less so.

I'm not sure if this is an important constraint (there are many many open questions of how variadic generics would work), but I wanted to put it out there.

@natebosch
Copy link
Member

In #2388 (comment) a syntax allowing runtime introspection is listed as a con.

In #2388 (comment) runtime introspection is listed as a potential future enhancement.

Do we need to separately figure out where we land on this before choosing a syntax?

@munificent
Copy link
Member

In #2388 (comment) runtime introspection is listed as a potential future enhancement.

In that comment, I'm not necessarily assuming that some kind of variadic generics would rely on runtime introspection. I would definitely prefer that it get expanded statically at compile time, though that certainly raises lots of questions. It may be that the right answer is to lean on macros for this.

Do we need to separately figure out where we land on this before choosing a syntax?

Not necessarily. I think we can pick whatever syntax we need for this and it won't entirely paint us into a corner if we later want to be able to write code that's polymorphic over record arity.

@Levi-Lesches
Copy link

I personally like $0, it indicates "zero" while being clear it's a language-provided construct, whereas field0 looks like a human-written getter.

Inside an extension on a record type, you could not simply use 0 as an implicit self send to access the zeroth field!

It would also look a little weird to see this, but I can see people getting used to it.

extension on (num, num) {
  double get distanceToOrigin => math.sqrt($0**2 + $1**2);
}

@lrhn
Copy link
Member

lrhn commented Aug 18, 2022

All the suggested syntaxes work as selectors, so they can be used with null-aware access (r?.0, r?[0], r?.$0) and cascades (r..0.action()..1.action(), etc.). That's good.

The .0 can have a parsing problem if chained: r.0.0 will tokenize as "identifier, dot, double-literal". We can probably work around that (special casing the tokenization of a double literal after a .), but it's an extra complication.
If you do dynamic r = (1, 2) as dynamic; print(r.0);, should it work? It can, and for consistency, it probably should.
If you then do dynamic r = MyClass() as dynamic; print(r.0);, what should happen? Should it call MyClass.noSuchMethod, or just fail like print(!r) would, being another non-overridable opertator not supported on the value? (Not entirely the same, !r fails because ! introduces a bool context, and the implict downcast to bool fails. There is no implicit downcast to a specific record type for r.0, and records with at least one positional element do not have a shared supertype which supports .0.)

The r[0] syntax interacts badly with dynamic access. On a record, it's a special operator, like .0, and must have a constant number. If you do dynamic r = (1, 2) as dynamic; print(r[0]);, should it work, or should it fail to find operator[]?
The record operation is not the index operator (operator[]) of an object, the typing is different, instead it's a special operator per index value, so the dynamic invocation will likely fail.
What if it was print(r[fib(1)]), a non-constant value? I'd say that must not work, because otherwise we've introduced functionality that's only available through dynamic invocations.

The $0/fiekd0 names work with dynamic invocations. They're not special in any way.
You can inspect a record to find its number of positional elements (up to a limit set by your source) by trying to dynamically read $0, $1, ... $999 until it throws. That's OK. It won't tell you the named elements, and if you know it's a tuple (no named elements), you could just try casting to (Object?, Object?, ..., Object?) instead and see if that worked.
This is definitely the solution with the least amount of new moving parts. Not the prettiest, but likely completely serviceable.

@sgrekhov
Copy link
Contributor

The .0 can have a parsing problem if chained: r.0.0 will tokenize as "identifier, dot, double-literal". We can probably work around that (special casing the tokenization of a double literal after a .), but it's an extra complication.

In this case the following code became possible

extension on double {
  Record call() {
    return (foo, (("", 3), 3.14));
  }
}

extension on Record {
  double operator* (double other) => 3.14;
}

void foo() {
  print("foo");
}

main() {
  3.14().0();
  3.14().1.0 * 1.1;
  1.1 * 3.14().1.1;
}

@lrhn
Copy link
Member

lrhn commented Aug 18, 2022

True, and if we then extend .0 member access to classes, say as operator 0 () => ..., then It'll probably be only seconds before we see things like:

 var ipv4 = ip.192.168.0.1;

or

var time = T.11.13.25.pm;

or similar shenanigans.

Let's ... not do that then.

(Not an entirely new possibility, it just looks bettern than T(11)(13)(25).pm or T[11][13][25].pm.)

(I probably wouldn't allow .0 on something of type Record, it needs a real record type which guarantees that the 0 field exists, but

extension on (Object?, ((String, int), double)) {
  ...
}

should work)

@natebosch
Copy link
Member

I like $0

@munificent
Copy link
Member

munificent commented Aug 18, 2022

OK, the parsing and readability problems of .0 have convinced me that's the wrong path.

Another problem with [] is that it could collide with the normal subscript operator:

extension RecordSubscript on (int, int) {
  int operator [](int index) => 3;
}

main() {
  (1, 2)[0];
}

Should this print 1 or 3? Is it an error to define a [] operator on a record type? What if you define it as an extension on a supertype of records?

It seems like no one likes field0. That leaves $0. I'm OK with it. If no one complains (@leafpetersen @eernstg @jakemac53 @stereotype441 @kallentu), I'll add that to the proposal.

@munificent munificent changed the title Records: Should there be a field access notation for positional fields? Should there be an expression syntax for accessing positional record fields? Aug 18, 2022
@munificent munificent added records Issues related to records. and removed patterns Issues related to pattern matching. labels Aug 18, 2022
@mmcdon20
Copy link

Would it be possible to use the name of a positional field if a name is provided in the type?

(int x, int y) position = (5, 10);
print(position.x); 

@Levi-Lesches
Copy link

I believe the point of having positional fields is to deliberately not expose the names of the fields in an API, similar to how you can't pass positional arguments by their names in function calls.

@mmcdon20
Copy link

I believe the point of having positional fields is to deliberately not expose the names of the fields in an API, similar to how you can't pass positional arguments by their names in function calls.

If you have a record with type (int x, int y) you would still create the record by passing in the values according to position rather than by name.

But once those arguments are passed in, you then refer to positional arguments by their names within the definitions of functions. Their names do not effect the signature of the function but having a name to refer to them by is more convenient than referring to them by their position.

I don't think using the names for positional fields of records in this way would be that much different. The names would not affect the record's shape, and you would be able to provide whatever names you want for the positional fields.

(int latitude, int longitude) getPosition() {
  return (5, 10);
}
print(getPosition().latitude); // okay

(int x, int y) position = getPosition(); // providing different names for positional fields is okay
print(position.x); // okay
print(position.latitude); // error

print(position == getPosition()); // true

There is the downside that changing the name of a positional field would be a breaking change. Not sure if there are other downsides/potential issues that I am missing.

@lrhn
Copy link
Member

lrhn commented Aug 19, 2022

No complaints, go with $0 etc.

We could, if we wanted to, defined $0 as a magical extension method. We could do that for named fields too.

That is, we could act as if the platform libraries exposed an infinite set of unnameable and unhidable extensions, one for each record type (tree-shaken to one for each record-shape which exists in the program), so that for (_, x: _) we have:

extension $$CantTouchThis$$<R, T> on (R, x: T) {
  R get $0 => switch (this) { case (it, x: _) => it; case _ => throw WatError("unreachable"); };
  T get x => switch (this) { case (_, x: it) => it; case _ => throw WatError("unreachable"); };
}

(But much more efficient, obviously!)

Then imperative record destructuring will only work at the static type of a record.
You wouldn't be able to get to record fields through dynamic dispatch.

I think the only advantage of doing so, is that we won't complicate dynamic invocations any further, and require retaining run-time information needed to perform such dynamic field gets.

@munificent
Copy link
Member

Would it be possible to use the name of a positional field if a name is provided in the type?

No, the positional field name is not part of the record's type. You could have multiple positional records that have different names for a given positional field and all are considered to have the same type and are freely assignable to each other. That means there's no reliable way to know which position a given field name should correspond to.

@natebosch
Copy link
Member

You wouldn't be able to get to record fields through dynamic dispatch.

I'd consider this a positive feature.

munificent added a commit that referenced this issue Aug 19, 2022
- Support constant records. Fix #2337.
- Support empty and one-positional-field records. Fix #2386.
- Re-add support for positional field getters Fix #2388.
- Specify the behavior of `toString()`. Fix #2389.
- Disambiguate record types in `on` clauses. Fix #2406.
munificent added a commit that referenced this issue Aug 25, 2022
* Address a bunch of records issues.

- Support constant records. Fix #2337.
- Support empty and one-positional-field records. Fix #2386.
- Re-add support for positional field getters Fix #2388.
- Specify the behavior of `toString()`. Fix #2389.
- Disambiguate record types in `on` clauses. Fix #2406.

* Clarify the iteration order of fields in `==`.

* Copy-edit the sections on const records and canonicalization.

There should be no meaningful changes. I just:

- Fixed some misspellings.
- Used Markdown style consistent with the rest of the doc.
- Re-worded things to, I hope, read a little more naturally.
- Removed the parenthetical on identical() in a const context because
  that felt a little too academic.

* Leave the order that positional fields are checked in == unspecified.

* Clarify that positional fields are not sugar for named fields.

Specify the evaluation order of fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
records Issues related to records.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants