Skip to content

Commit

Permalink
Editorial: Eliminate order-disambiguation from Annex B Pattern-grammar
Browse files Browse the repository at this point in the history
  • Loading branch information
jmdyck committed Apr 21, 2022
1 parent 918e0e9 commit 05c15a5
Showing 1 changed file with 19 additions and 12 deletions.
31 changes: 19 additions & 12 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,7 @@ <h1>Grammar Notation</h1>
`9`
</emu-grammar>
<p>If the phrase &ldquo;[empty]&rdquo; appears as the right-hand side of a production, it indicates that the production's right-hand side contains no terminals or nonterminals.</p>
<p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences.</p>
<p>If the phrase &ldquo;[lookahead = _seq_]&rdquo; appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, &ldquo;[lookahead &isin; _set_]&rdquo;, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences. [Need to loosen these restrictions for some of the productions in <emu-xref href="#sec-regular-expressions-patterns"></emu-xref>: maybe allow specification of _set_ via sentential form (not just a nonterminal), and allow infinite-but-regular sets.]</p>
<p>These conditions may be negated. &ldquo;[lookahead &ne; _seq_]&rdquo; indicates that the containing production may only be used if _seq_ is <em>not</em> a prefix of the immediately following input token sequence, and &ldquo;[lookahead &notin; _set_]&rdquo; indicates that the production may only be used if <em>no</em> element of _set_ is a prefix of the immediately following token sequence.</p>
<p>As an example, given the definitions:</p>
<emu-grammar type="definition" example>
Expand Down Expand Up @@ -46946,7 +46946,7 @@ <h2>Syntax</h2>

<emu-annex id="sec-regular-expressions-patterns">
<h1>Regular Expressions Patterns</h1>
<p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.</p>
<p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows.</p>
<p>This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.</p>
<h2>Syntax</h2>
<emu-grammar type="definition">
Expand Down Expand Up @@ -46976,13 +46976,13 @@ <h2>Syntax</h2>

ExtendedAtom[N] ::
`.`
`\` AtomEscape[~UnicodeMode, ?N]
`\` [lookahead == `c`]
`\` [lookahead &lt;! {`b`, `B`}] AtomEscape[~UnicodeMode, ?N]
`\` [lookahead == `c`] [lookahead &lt;! `c` ControlLetter]
CharacterClass[~UnicodeMode]
`(` GroupSpecifier[~UnicodeMode] Disjunction[~UnicodeMode, ?N] `)`
`(` `?` `:` Disjunction[~UnicodeMode, ?N] `)`
InvalidBracedQuantifier
ExtendedPatternCharacter
[lookahead &lt;! InvalidBracedQuantifier] ExtendedPatternCharacter

InvalidBracedQuantifier ::
`{` DecimalDigits[~Sep] `}`
Expand All @@ -46994,40 +46994,47 @@ <h2>Syntax</h2>

AtomEscape[UnicodeMode, N] ::
[+UnicodeMode] DecimalEscape
[~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
[~UnicodeMode] ConstrainedDecimalEscape
CharacterClassEscape[?UnicodeMode]
CharacterEscape[?UnicodeMode, ?N]
[+UnicodeMode] CharacterEscape[?UnicodeMode, ?N]
[~UnicodeMode] [lookahead &lt;! ConstrainedDecimalEscape] CharacterEscape[?UnicodeMode, ?N]
[+N] `k` GroupName[?UnicodeMode]

ConstrainedDecimalEscape ::
DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]

CharacterEscape[UnicodeMode, N] ::
ControlEscape
`c` ControlLetter
`0` [lookahead &notin; DecimalDigit]
HexEscapeSequence
RegExpUnicodeEscapeSequence[?UnicodeMode]
[~UnicodeMode] LegacyOctalEscapeSequence
IdentityEscape[?UnicodeMode, ?N]
[lookahead &lt;! HexEscapeSequence] [lookahead &lt;! RegExpUnicodeEscapeSequence] IdentityEscape[?UnicodeMode, ?N]

IdentityEscape[UnicodeMode, N] ::
[+UnicodeMode] SyntaxCharacter
[+UnicodeMode] `/`
[~UnicodeMode] SourceCharacterIdentityEscape[?N]

SourceCharacterIdentityEscape[N] ::
[~N] SourceCharacter but not `c`
[+N] SourceCharacter but not one of `c` or `k`
[~N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
[+N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` `k`
`or`
[~N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c`
[+N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` or `k`

ClassAtomNoDash[UnicodeMode, N] ::
SourceCharacter but not one of `\` or `]` or `-`
`\` ClassEscape[?UnicodeMode, ?N]
`\` [lookahead == `c`]
`\` [lookahead == `c`] [lookahead &lt;! `c` ClassControlLetter] [lookahead &lt;! `c` ControlLetter]

ClassEscape[UnicodeMode, N] ::
`b`
[+UnicodeMode] `-`
[~UnicodeMode] `c` ClassControlLetter
CharacterClassEscape[?UnicodeMode]
CharacterEscape[?UnicodeMode, ?N]
[lookahead != `b`] CharacterEscape[?UnicodeMode, ?N]

ClassControlLetter ::
DecimalDigit
Expand Down

0 comments on commit 05c15a5

Please sign in to comment.