From 920022c01395f77fa9a9f579ecbb4f6301aadb76 Mon Sep 17 00:00:00 2001 From: Mathias Bynens Date: Thu, 25 Jan 2018 21:37:01 -0800 Subject: [PATCH] [Normative] Add RegExp lookbehind assertions Proposal repo: https://github.com/tc39/proposal-regexp-lookbehind --- spec.html | 111 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 80 insertions(+), 31 deletions(-) diff --git a/spec.html b/spec.html index 66822a3229..c1419053ee 100644 --- a/spec.html +++ b/spec.html @@ -29113,6 +29113,8 @@

Syntax

`\` `B` `(` `?` `=` Disjunction[?U] `)` `(` `?` `!` Disjunction[?U] `)` + `(` `?` `<=` Disjunction[?U] `)` + `(` `?` `<!` Disjunction[?U] `)` Quantifier :: QuantifierPrefix @@ -29682,7 +29684,7 @@

Notation

Pattern

The production Pattern :: Disjunction evaluates as follows:

- 1. Evaluate |Disjunction| to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with +1 as its _direction_ argument to obtain a Matcher _m_. 1. Return an internal closure that takes two arguments, a String _str_ and an integer _index_, and performs the following steps: 1. Assert: _index_ ≤ the length of _str_. 1. If _Unicode_ is *true*, let _Input_ be a List consisting of the sequence of code points of _str_ interpreted as a UTF-16 encoded () Unicode string. Otherwise, let _Input_ be a List consisting of the sequence of code units that are the elements of _str_. _Input_ will be used throughout the algorithms in . Each element of _Input_ is considered to be a character. @@ -29701,6 +29703,7 @@

Pattern

Disjunction

+

With argument _direction_.

The production Disjunction :: Alternative evaluates as follows:

1. Evaluate |Alternative| to obtain a Matcher _m_. @@ -29708,8 +29711,8 @@

Disjunction

The production Disjunction :: Alternative `|` Disjunction evaluates as follows:

- 1. Evaluate |Alternative| to obtain a Matcher _m1_. - 1. Evaluate |Disjunction| to obtain a Matcher _m2_. + 1. Evaluate |Alternative| with argument _direction_ to obtain a Matcher _m1_. + 1. Evaluate |Disjunction| with argument _direction_ to obtain a Matcher _m2_. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: 1. Call _m1_(_x_, _c_) and let _r_ be its result. 1. If _r_ is not ~failure~, return _r_. @@ -29724,32 +29727,41 @@

Disjunction

["abc", "a", "a", undefined, "bc", undefined, "bc"]

and not

["abc", "ab", undefined, "ab", "c", "c", undefined]
+

The order in which the two alternatives are tried is independent of the value of _direction_.

Alternative

+

With argument _direction_.

The production Alternative :: [empty] evaluates as follows:

1. Return a Matcher that takes two arguments, a State _x_ and a Continuation _c_, and returns the result of calling _c_(_x_).

The production Alternative :: Alternative Term evaluates as follows:

- 1. Evaluate |Alternative| to obtain a Matcher _m1_. - 1. Evaluate |Term| to obtain a Matcher _m2_. - 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: - 1. Let _d_ be a Continuation that takes a State argument _y_ and returns the result of calling _m2_(_y_, _c_). - 1. Call _m1_(_x_, _d_) and return its result. + 1. Evaluate |Alternative| with argument _direction_ to obtain a Matcher _m1_. + 1. Evaluate |Term| with argument _direction_ to obtain a Matcher _m2_. + 1. If _direction_ is equal to +1, then + 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: + 1. Let _d_ be a Continuation that takes a State argument _y_ and returns the result of calling _m2_(_y_, _c_). + 1. Call _m1_(_x_, _d_) and return its result. + 1. Else, + 1. Assert: _direction_ is equal to -1. + 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: + 1. Let _d_ be a Continuation that takes a State argument _y_ and returns the result of calling _m1_(_y_, _c_) + 1. Call _m2_(_x_, _d_) and return its result. -

Consecutive |Term|s try to simultaneously match consecutive portions of _Input_. If the left |Alternative|, the right |Term|, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right |Term|, and all choices in the right |Term| are tried before moving on to the next choice in the left |Alternative|.

+

Consecutive |Term|s try to simultaneously match consecutive portions of _Input_. When _direction_ is equal to +1, if the left |Alternative|, the right |Term|, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right |Term|, and all choices in the right |Term| are tried before moving on to the next choice in the left |Alternative|. When _direction_ is equal to -1, the evaluation order of |Alternative| and |Term| are reversed.

Term

+

With argument _direction_.

The production Term :: Assertion evaluates as follows:

1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: @@ -29758,13 +29770,16 @@

Term

1. If _r_ is *false*, return ~failure~. 1. Call _c_(_x_) and return its result.
+ +

The AssertionTester is independent of _direction_.

+

The production Term :: Atom evaluates as follows:

- 1. Return the Matcher that is the result of evaluating |Atom|. + 1. Return the Matcher that is the result of evaluating |Atom| with argument _direction_.

The production Term :: Atom Quantifier evaluates as follows:

- 1. Evaluate |Atom| to obtain a Matcher _m_. + 1. Evaluate |Atom| with argument _direction_ to obtain a Matcher _m_. 1. Evaluate |Quantifier| to obtain the three results: an integer _min_, an integer (or ∞) _max_, and Boolean _greedy_. 1. Assert: If _max_ is finite, then _max_ is not less than _min_. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of Atom :: `(` Disjunction `)` Parse Nodes prior to or enclosing this |Term|. @@ -29886,7 +29901,7 @@

Assertion

The production Assertion :: `(` `?` `=` Disjunction `)` evaluates as follows:

- 1. Evaluate |Disjunction| to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with +1 as its _direction_ argument to obtain a Matcher _m_. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: 1. Let _d_ be a Continuation that always returns its State argument as a successful MatchResult. 1. Call _m_(_x_, _d_) and let _r_ be its result. @@ -29899,7 +29914,29 @@

Assertion

The production Assertion :: `(` `?` `!` Disjunction `)` evaluates as follows:

- 1. Evaluate |Disjunction| to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with +1 as its _direction_ argument to obtain a Matcher _m_. + 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: + 1. Let _d_ be a Continuation that always returns its State argument as a successful MatchResult. + 1. Call _m_(_x_, _d_) and let _r_ be its result. + 1. If _r_ is not ~failure~, return ~failure~. + 1. Call _c_(_x_) and return its result. + +

The production Assertion :: `(` `?` `<=` Disjunction `)` evaluates as follows:

+ + 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. + 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: + 1. Let _d_ be a Continuation that always returns its State argument as a successful MatchResult. + 1. Call _m_(_x_, _d_) and let _r_ be its result. + 1. If _r_ is ~failure~, return ~failure~. + 1. Let _y_ be _r_'s State. + 1. Let _cap_ be _y_'s _captures_ List. + 1. Let _xe_ be _x_'s _endIndex_. + 1. Let _z_ be the State (_xe_, _cap_). + 1. Call _c_(_z_) and return its result. + +

The production Assertion :: `(` `?` `<!` Disjunction `)` evaluates as follows:

+ + 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: 1. Let _d_ be a Continuation that always returns its State argument as a successful MatchResult. 1. Call _m_(_x_, _d_) and let _r_ be its result. @@ -30210,38 +30247,45 @@

Quantifier

Atom

+

With argument _direction_.

The production Atom :: PatternCharacter evaluates as follows:

1. Let _ch_ be the character matched by |PatternCharacter|. 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Call CharacterSetMatcher(_A_, *false*) and return its Matcher result. + 1. Call CharacterSetMatcher(_A_, *false*, _direction_) and return its Matcher result.

The production Atom :: `.` evaluates as follows:

1. If _DotAll_ is *true*, then 1. Let _A_ be the set of all characters. 1. Otherwise, let _A_ be the set of all characters except |LineTerminator|. - 1. Call CharacterSetMatcher(_A_, *false*) and return its Matcher result. + 1. Call CharacterSetMatcher(_A_, *false*, _direction_) and return its Matcher result.

The production Atom :: `\` AtomEscape evaluates as follows:

- 1. Return the Matcher that is the result of evaluating |AtomEscape|. + 1. Return the Matcher that is the result of evaluating |AtomEscape| with argument _direction_.

The production Atom :: CharacterClass evaluates as follows:

1. Evaluate |CharacterClass| to obtain a CharSet _A_ and a Boolean _invert_. - 1. Call CharacterSetMatcher(_A_, _invert_) and return its Matcher result. + 1. Call CharacterSetMatcher(_A_, _invert_, _direction_) and return its Matcher result.

The production Atom :: `(` Disjunction `)` evaluates as follows:

- 1. Evaluate |Disjunction| to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with argument _direction_ to obtain a Matcher _m_. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Atom|. This is the total number of Atom :: `(` Disjunction `)` Parse Nodes prior to or enclosing this |Atom|. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: 1. Let _d_ be an internal Continuation closure that takes one State argument _y_ and performs the following steps: 1. Let _cap_ be a copy of _y_'s _captures_ List. 1. Let _xe_ be _x_'s _endIndex_. 1. Let _ye_ be _y_'s _endIndex_. - 1. Let _s_ be a new List whose characters are the characters of _Input_ at indices _xe_ (inclusive) through _ye_ (exclusive). + 1. If _direction_ is equal to +1, then + 1. Assert: _xe_ <e; _ye_. + 1. Let _s_ be a fresh List whose characters are the characters of _Input_ at indices _xe_ (inclusive) through _ye_ (exclusive). + 1. Else, + 1. Assert: _direction_ is equal to -1. + 1. Assert: _ye_ <e; _xe_. + 1. Let _s_ be a fresh List whose characters are the characters of _Input_ at indices _ye_ (inclusive) through _xe_ (exclusive). 1. Set _cap_[_parenIndex_+1] to _s_. 1. Let _z_ be the State (_ye_, _cap_). 1. Call _c_(_z_) and return its result. @@ -30249,25 +30293,28 @@

Atom

The production Atom :: `(` `?` `:` Disjunction `)` evaluates as follows:

- 1. Return the Matcher that is the result of evaluating |Disjunction|. + 1. Return the Matcher that is the result of evaluating |Disjunction| with argument _direction_. -

Runtime Semantics: CharacterSetMatcher ( _A_, _invert_ )

-

The abstract operation CharacterSetMatcher takes two arguments, a CharSet _A_ and a Boolean flag _invert_, and performs the following steps:

+

Runtime Semantics: CharacterSetMatcher ( _A_, _invert_, _direction_ )

+

The abstract operation CharacterSetMatcher takes three arguments, a CharSet _A_, a Boolean flag _invert_, and an integer _direction_, and performs the following steps:

1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: 1. Let _e_ be _x_'s _endIndex_. - 1. If _e_ is _InputLength_, return ~failure~. - 1. Let _ch_ be the character _Input_[_e_]. + 1. Let _f_ be _e_ + _direction_. + 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. + 1. Let _index_ be min(_e_, _f_). + 1. Let _ch_ be the character _Input_[_index_]. 1. Let _cc_ be Canonicalize(_ch_). 1. If _invert_ is *false*, then 1. If there does not exist a member _a_ of set _A_ such that Canonicalize(_a_) is _cc_, return ~failure~. - 1. Else _invert_ is *true*, + 1. Else, + 1. Assert: _invert_ is *true*. 1. If there exists a member _a_ of set _A_ such that Canonicalize(_a_) is _cc_, return ~failure~. 1. Let _cap_ be _x_'s _captures_ List. - 1. Let _y_ be the State (_e_+1, _cap_). + 1. Let _y_ be the State (_f_, _cap_). 1. Call _c_(_y_) and return its result.
@@ -30365,6 +30412,7 @@

Runtime Semantics: UnicodeMatchPropertyValue ( _p_, _v_ )

AtomEscape

+

With argument _direction_.

The production AtomEscape :: DecimalEscape evaluates as follows:

1. Evaluate |DecimalEscape| to obtain an integer _n_. @@ -30375,12 +30423,12 @@

AtomEscape

1. Evaluate |CharacterEscape| to obtain a character _ch_. 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Call CharacterSetMatcher(_A_, *false*) and return its Matcher result. + 1. Call CharacterSetMatcher(_A_, *false*, _direction_) and return its Matcher result.

The production AtomEscape :: CharacterClassEscape evaluates as follows:

1. Evaluate |CharacterClassEscape| to obtain a CharSet _A_. - 1. Call CharacterSetMatcher(_A_, *false*) and return its Matcher result. + 1. Call CharacterSetMatcher(_A_, *false*, _direction_) and return its Matcher result.

An escape sequence of the form `\\` followed by a nonzero decimal number _n_ matches the result of the _n_th set of capturing parentheses (). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_th one is *undefined* because it has not captured anything, then the backreference always succeeds.

@@ -31291,9 +31339,10 @@

Runtime Semantics: BackreferenceMatcher Abstract Operation

1. If _s_ is *undefined*, return _c_(_x_). 1. Let _e_ be _x_'s _endIndex_. 1. Let _len_ be the number of elements in _s_. - 1. Let _f_ be _e_ + _len_. - 1. If _f_ > _InputLength_, return ~failure~. - 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_e_ + _i_]), return ~failure~. + 1. Let _f_ be _e_ + _direction_ × _len_. + 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. + 1. Let _g_ be min(_e_, _f_). + 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_g_ + _i_]), return ~failure~. 1. Let _y_ be the State (_f_, _cap_). 1. Call _c_(_y_) and return its result.