From 05c15a5c17fa57ad58ee1197d8bc9e24240490de Mon Sep 17 00:00:00 2001
From: Michael Dyck <jmdyck@ibiblio.org>
Date: Sun, 27 Jun 2021 10:54:12 -0400
Subject: [PATCH] Editorial: Eliminate order-disambiguation from Annex B
 Pattern-grammar

---
 spec.html | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/spec.html b/spec.html
index 16f6376797..8788ceafb1 100644
--- a/spec.html
+++ b/spec.html
@@ -766,7 +766,7 @@ <h1>Grammar Notation</h1>
           `9`
       </emu-grammar>
       <p>If the phrase &ldquo;[empty]&rdquo; appears as the right-hand side of a production, it indicates that the production's right-hand side contains no terminals or nonterminals.</p>
-      <p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences.</p>
+      <p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences. [Need to loosen these restrictions for some of the productions in <emu-xref href="#sec-regular-expressions-patterns"></emu-xref>: maybe allow specification of _set_ via sentential form (not just a nonterminal), and allow infinite-but-regular sets.]</p>
       <p>These conditions may be negated. &ldquo;[lookahead &ne; _seq_]&rdquo; indicates that the containing production may only be used if _seq_ is <em>not</em> a prefix of the immediately following input token sequence, and &ldquo;[lookahead &notin; _set_]&rdquo; indicates that the production may only be used if <em>no</em> element of _set_ is a prefix of the immediately following token sequence.</p>
       <p>As an example, given the definitions:</p>
       <emu-grammar type="definition" example>
@@ -46946,7 +46946,7 @@ <h2>Syntax</h2>
 
     <emu-annex id="sec-regular-expressions-patterns">
       <h1>Regular Expressions Patterns</h1>
-      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.</p>
+      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows.</p>
       <p>This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.</p>
       <h2>Syntax</h2>
       <emu-grammar type="definition">
@@ -46976,13 +46976,13 @@ <h2>Syntax</h2>
 
         ExtendedAtom[N] ::
           `.`
-          `\` AtomEscape[~UnicodeMode, ?N]
-          `\` [lookahead == `c`]
+          `\` [lookahead &lt;! {`b`, `B`}] AtomEscape[~UnicodeMode, ?N]
+          `\` [lookahead == `c`] [lookahead &lt;! `c` ControlLetter]
           CharacterClass[~UnicodeMode]
           `(` GroupSpecifier[~UnicodeMode] Disjunction[~UnicodeMode, ?N] `)`
           `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)`
           InvalidBracedQuantifier
-          ExtendedPatternCharacter
+          [lookahead &lt;! InvalidBracedQuantifier] ExtendedPatternCharacter
 
         InvalidBracedQuantifier ::
           `{` DecimalDigits[~Sep] `}`
@@ -46994,11 +46994,15 @@ <h2>Syntax</h2>
 
         AtomEscape[UnicodeMode, N] ::
           [+UnicodeMode] DecimalEscape
-          [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+          [~UnicodeMode] ConstrainedDecimalEscape
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?N]
+          [+UnicodeMode] CharacterEscape[?UnicodeMode, ?N]
+          [~UnicodeMode] [lookahead &lt;! ConstrainedDecimalEscape] CharacterEscape[?UnicodeMode, ?N]
           [+N] `k` GroupName[?UnicodeMode]
 
+        ConstrainedDecimalEscape ::
+          DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+
         CharacterEscape[UnicodeMode, N] ::
           ControlEscape
           `c` ControlLetter
@@ -47006,7 +47010,7 @@ <h2>Syntax</h2>
           HexEscapeSequence
           RegExpUnicodeEscapeSequence[?UnicodeMode]
           [~UnicodeMode] LegacyOctalEscapeSequence
-          IdentityEscape[?UnicodeMode, ?N]
+          [lookahead &lt;! HexEscapeSequence] [lookahead &lt;! RegExpUnicodeEscapeSequence] IdentityEscape[?UnicodeMode, ?N]
 
         IdentityEscape[UnicodeMode, N] ::
           [+UnicodeMode] SyntaxCharacter
@@ -47014,20 +47018,23 @@ <h2>Syntax</h2>
           [~UnicodeMode] SourceCharacterIdentityEscape[?N]
 
         SourceCharacterIdentityEscape[N] ::
-          [~N] SourceCharacter but not `c`
-          [+N] SourceCharacter but not one of `c` or `k`
+          [~N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
+          [+N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` `k`
+          `or`
+          [~N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c`
+          [+N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` or `k`
 
         ClassAtomNoDash[UnicodeMode, N] ::
           SourceCharacter but not one of `\` or `]` or `-`
           `\` ClassEscape[?UnicodeMode, ?N]
-          `\` [lookahead == `c`]
+          `\` [lookahead == `c`] [lookahead &lt;! `c` ClassControlLetter] [lookahead &lt;! `c` ControlLetter]
 
         ClassEscape[UnicodeMode, N] ::
           `b`
           [+UnicodeMode] `-`
           [~UnicodeMode] `c` ClassControlLetter
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?N]
+          [lookahead != `b`] CharacterEscape[?UnicodeMode, ?N]
 
         ClassControlLetter ::
           DecimalDigit