From 05c15a5c17fa57ad58ee1197d8bc9e24240490de Mon Sep 17 00:00:00 2001 From: Michael Dyck Date: Sun, 27 Jun 2021 10:54:12 -0400 Subject: [PATCH] Editorial: Eliminate order-disambiguation from Annex B Pattern-grammar --- spec.html | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/spec.html b/spec.html index 16f6376797..8788ceafb1 100644 --- a/spec.html +++ b/spec.html @@ -766,7 +766,7 @@

Grammar Notation

`9`

If the phrase “[empty]” appears as the right-hand side of a production, it indicates that the production's right-hand side contains no terminals or nonterminals.

-

If the phrase “[lookahead = _seq_]” appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, “[lookahead ∈ _set_]”, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences.

+

If the phrase “[lookahead = _seq_]” appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, “[lookahead ∈ _set_]”, where _set_ is a finite nonempty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences. [Need to loosen these restrictions for some of the productions in : maybe allow specification of _set_ via sentential form (not just a nonterminal), and allow infinite-but-regular sets.]

These conditions may be negated. “[lookahead ≠ _seq_]” indicates that the containing production may only be used if _seq_ is not a prefix of the immediately following input token sequence, and “[lookahead ∉ _set_]” indicates that the production may only be used if no element of _set_ is a prefix of the immediately following token sequence.

As an example, given the definitions:

@@ -46946,7 +46946,7 @@

Syntax

Regular Expressions Patterns

-

The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

+

The syntax of is modified and extended as follows.

This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.

Syntax

@@ -46976,13 +46976,13 @@

Syntax

ExtendedAtom[N] :: `.` - `\` AtomEscape[~UnicodeMode, ?N] - `\` [lookahead == `c`] + `\` [lookahead <! {`b`, `B`}] AtomEscape[~UnicodeMode, ?N] + `\` [lookahead == `c`] [lookahead <! `c` ControlLetter] CharacterClass[~UnicodeMode] `(` GroupSpecifier[~UnicodeMode] Disjunction[~UnicodeMode, ?N] `)` `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)` InvalidBracedQuantifier - ExtendedPatternCharacter + [lookahead <! InvalidBracedQuantifier] ExtendedPatternCharacter InvalidBracedQuantifier :: `{` DecimalDigits[~Sep] `}` @@ -46994,11 +46994,15 @@

Syntax

AtomEscape[UnicodeMode, N] :: [+UnicodeMode] DecimalEscape - [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)] + [~UnicodeMode] ConstrainedDecimalEscape CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode, ?N] + [+UnicodeMode] CharacterEscape[?UnicodeMode, ?N] + [~UnicodeMode] [lookahead <! ConstrainedDecimalEscape] CharacterEscape[?UnicodeMode, ?N] [+N] `k` GroupName[?UnicodeMode] + ConstrainedDecimalEscape :: + DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)] + CharacterEscape[UnicodeMode, N] :: ControlEscape `c` ControlLetter @@ -47006,7 +47010,7 @@

Syntax

HexEscapeSequence RegExpUnicodeEscapeSequence[?UnicodeMode] [~UnicodeMode] LegacyOctalEscapeSequence - IdentityEscape[?UnicodeMode, ?N] + [lookahead <! HexEscapeSequence] [lookahead <! RegExpUnicodeEscapeSequence] IdentityEscape[?UnicodeMode, ?N] IdentityEscape[UnicodeMode, N] :: [+UnicodeMode] SyntaxCharacter @@ -47014,20 +47018,23 @@

Syntax

[~UnicodeMode] SourceCharacterIdentityEscape[?N] SourceCharacterIdentityEscape[N] :: - [~N] SourceCharacter but not `c` - [+N] SourceCharacter but not one of `c` or `k` + [~N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` + [+N] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` `k` + `or` + [~N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` + [+N] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` or `k` ClassAtomNoDash[UnicodeMode, N] :: SourceCharacter but not one of `\` or `]` or `-` `\` ClassEscape[?UnicodeMode, ?N] - `\` [lookahead == `c`] + `\` [lookahead == `c`] [lookahead <! `c` ClassControlLetter] [lookahead <! `c` ControlLetter] ClassEscape[UnicodeMode, N] :: `b` [+UnicodeMode] `-` [~UnicodeMode] `c` ClassControlLetter CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode, ?N] + [lookahead != `b`] CharacterEscape[?UnicodeMode, ?N] ClassControlLetter :: DecimalDigit