Editorial: Eliminate order-disambiguation from Annex B Pattern-grammar

tc39 · Apr 21, 2022 · 05c15a5 · 05c15a5
1 parent 918e0e9
commit 05c15a5
Showing 1 changed file with 19 additions and 12 deletions.
diff --git a/spec.html b/spec.html
@@ -766,7 +766,7 @@ <h1>Grammar Notation</h1>
           `9`
       </emu-grammar>
       <p>If the phrase &ldquo;[empty]&rdquo; appears as the right-hand side of a production, it indicates that the production's right-hand side contains no terminals or nonterminals.</p>
-      <p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences.</p>
+      <p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences. [Need to loosen these restrictions for some of the productions in <emu-xref href="#sec-regular-expressions-patterns"></emu-xref>: maybe allow specification of _set_ via sentential form (not just a nonterminal), and allow infinite-but-regular sets.]</p>
       <p>These conditions may be negated. &ldquo;[lookahead &ne; _seq_]&rdquo; indicates that the containing production may only be used if _seq_ is <em>not</em> a prefix of the immediately following input token sequence, and &ldquo;[lookahead &notin; _set_]&rdquo; indicates that the production may only be used if <em>no</em> element of _set_ is a prefix of the immediately following token sequence.</p>
       <p>As an example, given the definitions:</p>
       <emu-grammar type="definition" example>
@@ -46946,7 +46946,7 @@ <h2>Syntax</h2>
 
     <emu-annex id="sec-regular-expressions-patterns">
       <h1>Regular Expressions Patterns</h1>
-      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.</p>
+      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows.</p>
       <p>This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.</p>
       <h2>Syntax</h2>
       <emu-grammar type="definition">
@@ -46976,13 +46976,13 @@ <h2>Syntax</h2>
 
         ExtendedAtom[N] ::
           `.`
-          `\` AtomEscape[~UnicodeMode, ?N]
-          `\` [lookahead == `c`]
+          `\` [lookahead &lt;! {`b`, `B`}] AtomEscape[~UnicodeMode, ?N]
+          `\` [lookahead == `c`] [lookahead &lt;! `c` ControlLetter]
           CharacterClass[~UnicodeMode]
           `(` GroupSpecifier[~UnicodeMode] Disjunction[~UnicodeMode, ?N] `)`
           `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)`
           InvalidBracedQuantifier
-          ExtendedPatternCharacter
+          [lookahead &lt;! InvalidBracedQuantifier] ExtendedPatternCharacter
 
         InvalidBracedQuantifier ::
           `{` DecimalDigits[~Sep] `}`
@@ -46994,40 +46994,47 @@ <h2>Syntax</h2>
 
         AtomEscape[UnicodeMode, N] ::
           [+UnicodeMode] DecimalEscape
-          [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+          [~UnicodeMode] ConstrainedDecimalEscape
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?N]
+          [+UnicodeMode] CharacterEscape[?UnicodeMode, ?N]
+          [~UnicodeMode] [lookahead &lt;! ConstrainedDecimalEscape] CharacterEscape[?UnicodeMode, ?N]
           [+N] `k` GroupName[?UnicodeMode]
 
+        ConstrainedDecimalEscape ::
+          DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+
         CharacterEscape[UnicodeMode, N] ::
           ControlEscape
           `c` ControlLetter
           `0` [lookahead &notin; DecimalDigit]
           HexEscapeSequence
           RegExpUnicodeEscapeSequence[?UnicodeMode]
           [~UnicodeMode] LegacyOctalEscapeSequence
-          IdentityEscape[?UnicodeMode, ?N]
+          [lookahead &lt;! HexEscapeSequence] [lookahead &lt;! RegExpUnicodeEscapeSequence] IdentityEscape[?UnicodeMode, ?N]
 
         IdentityEscape[UnicodeMode, N] ::
           [+UnicodeMode] SyntaxCharacter
           [+UnicodeMode] `/`
           [~UnicodeMode] SourceCharacterIdentityEscape[?N]
 
         SourceCharacterIdentityEscape[N] ::
-          [~N] SourceCharacter but not `c`
-          [+N] SourceCharacter but not one of `c` or `k`
+          [~N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
+          [+N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` `k`
+          `or`
+          [~N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c`
+          [+N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` or `k`
 
         ClassAtomNoDash[UnicodeMode, N] ::
           SourceCharacter but not one of `\` or `]` or `-`
           `\` ClassEscape[?UnicodeMode, ?N]
-          `\` [lookahead == `c`]
+          `\` [lookahead == `c`] [lookahead &lt;! `c` ClassControlLetter] [lookahead &lt;! `c` ControlLetter]
 
         ClassEscape[UnicodeMode, N] ::
           `b`
           [+UnicodeMode] `-`
           [~UnicodeMode] `c` ClassControlLetter
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?N]
+          [lookahead != `b`] CharacterEscape[?UnicodeMode, ?N]
 
         ClassControlLetter ::
           DecimalDigit