Update grammar for parser unification #750

ehuss · 2020-02-09T19:02:23Z

Also, #722 has a mistake where visibility is allowed on a macro invocation for a trait item. This is not allowed, and should have been written more like InherentImplItem is.

ehuss · 2020-02-09T19:04:35Z

@Centril, the recent changes like this and rust-lang/rust#66183 and rust-lang/rust#62550 and rust-lang/rust#68764 and rust-lang/rust#68788 have been making it much harder to document the grammar. There is now a very abstract syntax that is getting further removed from the actual grammar for the language that is relevant to most people (i.e., the grammar for what people actually write). Do you have suggestions on how we can document this in a simple way?

I'd be loathe to create two grammars (the parsed one and the semantic one), since they mostly overlap. I also don't like adding huge sections that explain what the restrictions are. There are now many restrictions on the grammar that are hard to express and document. Many portions now have to be singled out as "this is not accepted, because it's unstable" or "because it is shared with this other thing", or "because that's just the way it is".

The AssocItem change is a good example, where there is a parse-time restriction on edition-based trait items. I think the old grammar handled this well, but now...I have no idea how to express that. I don't want to add parameters to the grammar. It could describe the 2015 grammar, and then include a note somewhere about what is rejected in parsing 2018. But this is piling on the complexity of how to define the grammar.

Even documenting subslice patterns, which I consider an important change, is very difficult because there are deep changes to the pattern grammar, and then it needs to be qualified with all the new restrictions. I'm at a loss now as to how to move forward.

Centril · 2020-02-10T11:54:06Z

I think it's harder because we are using the wrong tools, and have been doing so for a long time. We're documenting the reference under the illusion that Rust is in fact one language, but it is not. In observable terms, we have at minimum two languages, one syntactic and one semantic (and I think you're underestimating how much of AST validation we're not specifying at the moment even before the changes you've linked). But even so, I don't think it makes sense to e.g. document eventual borrow checking rules on something resembling HIR or HAIR; those can only be reasonably explained by something graph-like, e.g. like MIR (we did throw out ast_borrowck for a reason).

I also think it's not a good idea to document each concept, e.g. "if expressions" thoroughly from parsing to dynamic semantics. For a precise definition of the language, it's better to order explanations as a pipeline, taking one IR and transforming it down to another, as that is how any actual implementation, including for an interpreter, would be structured, even if it was a reference implementation in e.g. Agda. Then, in each part of the pipeline, you only explain what is needed in that pipe. This is what e.g. SML's spec does (see e.g. "bare language"), http://sml-family.org/sml97-defn.pdf, and CakeML goes even further, https://cakeml.org/.

So yes, I think we should give into having two "grammars" (where the latter is abstract syntax, not concrete) especially because a real formal specification without it has no hope of success. Transforming from the former to the latter can be explained using a denotational style (we can translate this fairly easily to inference rules as well for a more type-checking like style):

elab_slice_pats : [Syn.Pat] -> ([Abs.Pat], Option Abs.Pat, [Abs.Pat])
elab_slice_pats ([[ .. ]] :: ps) = ([], Some Wild, map elab_pat ps)
elab_slice_pats ([[ b @ .. ]] :: ps) = ([], Some (e_binding b Wild), map elab_pat ps)
elab_slice_pats (p :: ps) = (elab_pat p :: pre, mid, suf)
  where (pre, mid, suf) = elab_slice_pats ps

elab_pat : Syn.Pat -> Abs.Pat
elab_pat [[ .. ]] = error -- propagates implicitly; or we use monads if needed.
elab_pat [[ [ p0, ..., pn ] ]] = elab_slice_pats [[ [ p0, ..., pn] ]]
... -- more cases

Aside from introducing more syntactic shorthands, this is the tersest definition I can think of (way shorter than the actual Rust code, but it is still formal, and nearly executable). My initial versions of the slice patterns report had something like this. We can type set this appropriately and add comments where interesting things, e.g. not just identical structural transformations, happen. Auxiliary definitions can be defined and we can explain the syntactic conventions first.

However, you asked for a simple way, which I'm going to interpret as "don't rock the boat". To avoid doing so, I suppose we could add a chapter for "restrictions on syntax" where we say in a "declarative" way which productions would result in errors:

visit Trait => item_cx := Assoc(Trait) -- Implicitly following stack discipline.
visit Impl => item_cx := Assoc(Impl)
visit Statement | Module | Crate => item_cx := Free
visit ExternBlock => item_cx := Extern

error
  when item_cx == Extern
  on FunctionItem f where f.body != ";"

This is a sort of made up DSL for how AST validation behaves (not exactly a clean approach in my view; it's very much "validate" instead of "parse").

petrochenkov · 2020-02-10T17:28:16Z

FWIW, C++ standard has a syntax section that lists the grammar ignoring all the semantic restrictions (e.g. with quailfier* meaning an arbitrary list of qualifiers).
The restrictions are then described in various places in the language specification sections (e.g. these qualifiers are incompatible, or this qualifier must be unique).

ehuss · 2020-02-13T19:51:36Z

Thanks for the detailed response. I really liked the preface to the sml definition, it is very relevant. I'll contemplate on this in the background, but it is unlikely I'll make progress soon.

and I think you're underestimating how much of AST validation we're not specifying

I'm well aware that the reference grammar is very under-specified (and flat out wrong in some places). I was hoping wg-grammar would lead towards fixing things, but it seems like there isn't enough momentum.

My general feeling is that it may be too difficult to make progress on a high-quality, precise language spec using only volunteer contributors. It takes an uncommon set of skills, and there just doesn't seem to be the incentives and interest to do it. What you wrote sounds very interesting, but I want to keep our expectations calibrated with what can realistically be done given the involvement we have now (which is close to zero).

ehuss added the A-grammar Area: Syntax and parsing label Feb 9, 2020

ehuss changed the title ~~Update grammar for new AssocItem~~ Update grammar for parser unification Feb 26, 2020

ehuss mentioned this issue Jan 11, 2021

Update grammar for parser unification. #927

Merged

ehuss closed this as completed in #927 Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update grammar for parser unification #750

Update grammar for parser unification #750

ehuss commented Feb 9, 2020

ehuss commented Feb 9, 2020

Centril commented Feb 10, 2020 •

edited

Loading

petrochenkov commented Feb 10, 2020

ehuss commented Feb 13, 2020

Update grammar for parser unification #750

Update grammar for parser unification #750

Comments

ehuss commented Feb 9, 2020

ehuss commented Feb 9, 2020

Centril commented Feb 10, 2020 • edited Loading

petrochenkov commented Feb 10, 2020

ehuss commented Feb 13, 2020

Centril commented Feb 10, 2020 •

edited

Loading