Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

jmdyck · 2021-09-23T04:32:25Z

phase 1 (4 commits): Simple refactorings that make phase 3 easier.
phase 2 (3 commits): Replace multi-value returns with Records. (This is easier to do before phase 3 than during.)
phase 3 (9 commits + 2 fixups): Take the (idiosyncratic) rules for "evaluating" Patterns and reformulate them into 8 SDOs that are defined and invoked more conventionally.

This PR's change to step 14 of RegExpInitialize pretty much assumes that you're okay with the points of PR #2391, so it would probably make sense to decide that one first.

bakkot · 2021-09-23T06:59:25Z

spec.html

+          <p>This section is amended in <emu-xref href="#sec-compilesubpattern-annexb"></emu-xref>.</p>
+        </emu-note>
+
+        <!-- Disjunction -->


Did you mean to leave these comments in?

Yup. In the status quo, each nonterminal gets its own clause, but in this PR, three of the new SDOs (CompileSubpattern, CompileAtom, and CompileToCharSet) include rules for multiple nonterminals, and I wasn't sure if the loss of a per-nonterminal heading would be 'disorienting', so I left these comments in case something was wanted.

Looking at the rendered spec, I don't really feel the lack of per-nonterminal headings, but I can imagine that some people might like to have them added. Looking at the source, I think the comments help navigation slightly, but I wouldn't be crushed if the editors wanted them removed.

bakkot

This is great. I had a couple small wordsmithing things, but otherwise LGTM.

spec.html

jmdyck · 2021-09-29T03:36:28Z

Force-pushed to add two fixup commits, incorporating @bakkot's suggestions.

michaelficarra

LGTM, but can we switch out the 1 or -1 direction thing with a spec enum like ~forward~ and ~backward~?

bakkot · 2021-10-21T00:53:52Z

I'm in favor of that change, but it seems unrelated to this PR. We can land it in a followup.

ljharb · 2021-10-21T06:30:14Z

@jmdyck do you want to rebase this and condense it into a smaller commit list, before i merge it?

jmdyck · 2021-10-21T21:34:43Z

I've force-pushed to rebase to master and merge in the two fixup commits. (But it's weird: GitHub has the update in my Pattern_ops branch, but it's not showing up in this PR. Maybe just a delay?)

I'm fine with the resulting set of commits, but editors, let me know if you'd like a smaller set.

michaelficarra · 2021-10-22T01:06:43Z

It's a bit granular, but that's okay. I rarely care to have anything finer than a whole PR in the git history, personally.

... at its only use, in the evaluation rule for `AtomEscape :: CharacterEscape`. This is easy because the |CharacterEscape| evaluation algorithm doesn't reference any substructure of the |CharacterEscape|. Annex B's `CharacterEscape :: LegacyOctalEscapeSequence` production has the same evaluation rule, and so can be discarded also. (Although |CharacterEscape| also occurs in `ClassEscape :: CharacterEscape`, the latter's evaluation rule doesn't evaluate |CharacterEscape|, so is unaffected by the removal of the evaluation rule for |CharacterEscape|.)

... at its only use, in the evaluation rule for `AtomEscape :: DecimalEscape`. This is easy because the |DecimalEscape| evaluation algorithm doesn't reference any substructure of the |DecimalEscape|. Also, the <emu-note> in the AtomEscape clause is paraphrasing the evaluation rule for `AtomEscape :: DecimalEscape` production, so move it up to be immediately after that rule. That <emu-note> is roughly a superset of the <emu-note> that accompanied the evaluation rule for |DecimalEscape|, so we're not losing anything by discarding the latter <emu-note>.

…#2531) ... to after those for |ClassEscape|. This is a reasonable spot, because of the production `ClassEscape :: CharacterClassEscape`, but will be of more benefit to a future refactoring.

…lause (tc39#2531) Specifically: - CharacterSetMatcher - Canonicalize - UnicodeMatchProperty - UnicodeMatchPropertyValue - CharacterRange (They need to be "out of the way" for the next series of refactorings.)

Instead of returning "the three results _min_, _max_, and _greedy_", return a Record with [[Min]], [[Max]], and [[Greedy]] fields.

…9#2531) Instead of returning "the two results _min_ and _max_", return a Record with [[Min]] and [[Max]] fields.

…2531) Instead of returning "a CharSet _A_ and a Boolean _invert_", return a Record with [[CharSet]] and [[Invert]] fields.

Take all the regexp evaluation rules for: - ClassRanges - NonemptyClassRanges - NonemptyClassRangesNoDash - ClassAtom - ClassAtomNoDash - ClassEscape - CharacterClassEscape - UnicodePropertyValueExpression (that is, all the rules that return a CharSet) and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for CharacterClass and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for Atom and AtomEscape and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for QuantifierPrefix and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for Quantifier and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for Assertion and re-formulate them as a more conventional SDO.

Take all the regexp evaluation rules for Disjunction, Alternative, and Term, and re-formulate them as a more conventional SDO.

Take the regexp evaluation rule for Pattern, and re-formulate it as a more conventional SDO.

…9#2531) (in CompileSubpattern, CompileAtom, CompileToCharSet, CompileAssertion)

jmdyck · 2021-10-22T02:07:11Z

Force-pushed just to add the PR number to commit messages. This time, GitHub propagated the change here.

After the merge of tc39#2531, `_patternCharacters_` is no longer used/referenced, so remove the steps that define it. Also, add a NOTE after step 13.

jmdyck added the editorial change label Sep 23, 2021

bakkot reviewed Sep 23, 2021

View reviewed changes

bakkot approved these changes Sep 28, 2021

View reviewed changes

spec.html Outdated Show resolved Hide resolved

spec.html Outdated Show resolved Hide resolved

spec.html Outdated Show resolved Hide resolved

jmdyck force-pushed the Pattern_ops branch from 56079e2 to 0439757 Compare September 29, 2021 03:35

michaelficarra approved these changes Oct 21, 2021

View reviewed changes

bakkot added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Oct 21, 2021

jmdyck added 16 commits October 21, 2021 22:01

Editorial: Move the evaluation rules for |CharacterClassEscape| (tc39…

c8cb943

…#2531) ... to after those for |ClassEscape|. This is a reasonable spot, because of the production `ClassEscape :: CharacterClassEscape`, but will be of more benefit to a future refactoring.

Editorial: Move some abstract operations in the "Pattern Semantics" c…

5a64ba1

…lause (tc39#2531) Specifically: - CharacterSetMatcher - Canonicalize - UnicodeMatchProperty - UnicodeMatchPropertyValue - CharacterRange (They need to be "out of the way" for the next series of refactorings.)

Editorial: When evaluating a |Quantifier|, return a Record (tc39#2531)

3f83f0b

Instead of returning "the three results _min_, _max_, and _greedy_", return a Record with [[Min]], [[Max]], and [[Greedy]] fields.

Editorial: When evaluating a |QuantifierPrefix|, return a Record (tc3…

aa99b03

…9#2531) Instead of returning "the two results _min_ and _max_", return a Record with [[Min]] and [[Max]] fields.

Editorial: When evaluating a |CharacterClass|, return a Record (tc39#…

6901687

…2531) Instead of returning "a CharSet _A_ and a Boolean _invert_", return a Record with [[CharSet]] and [[Invert]] fields.

Editorial: Introduce CompileCharacterClass SDO (tc39#2531)

451459a

Take all the regexp evaluation rules for CharacterClass and re-formulate them as a more conventional SDO.

Editorial: Introduce CompileAtom SDO (tc39#2531)

2566415

Take all the regexp evaluation rules for Atom and AtomEscape and re-formulate them as a more conventional SDO.

Editorial: Introduce CompileQuantifierPrefix SDO (tc39#2531)

50d2877

Take all the regexp evaluation rules for QuantifierPrefix and re-formulate them as a more conventional SDO.

Editorial: Introduce CompileQuantifier SDO (tc39#2531)

9bba3b2

Take all the regexp evaluation rules for Quantifier and re-formulate them as a more conventional SDO.

Editorial: Introduce CompileAssertion SDO (tc39#2531)

0c6d098

Take all the regexp evaluation rules for Assertion and re-formulate them as a more conventional SDO.

Editorial: Introduce CompileSubpattern SDO (tc39#2531)

b5e20b0

Take all the regexp evaluation rules for Disjunction, Alternative, and Term, and re-formulate them as a more conventional SDO.

Editorial: Introduce CompilePattern SDO (tc39#2531)

05ed23c

Take the regexp evaluation rule for Pattern, and re-formulate it as a more conventional SDO.

Editorial: Drop Compile rules that are handled by the chain rule (tc3…

f534fd6

…9#2531) (in CompileSubpattern, CompileAtom, CompileToCharSet, CompileAssertion)

jmdyck force-pushed the Pattern_ops branch from e9dc6aa to f534fd6 Compare October 22, 2021 02:04

ljharb merged commit f534fd6 into tc39:master Oct 22, 2021

bakkot mentioned this pull request Oct 22, 2021

Editorial: Use spec enums for direction instead of 1 and -1 #2553

Merged

jmdyck deleted the Pattern_ops branch October 25, 2021 21:11

jmdyck mentioned this pull request Oct 26, 2021

Editorial: Simplify RegExpInitialize #2391

Merged

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Oct 26, 2021

Editorial: Simplify RegExpInitialize (tc39#2391)

c57bbe0

After the merge of tc39#2531, `_patternCharacters_` is no longer used/referenced, so remove the steps that define it. Also, add a NOTE after step 13.

bakkot mentioned this pull request Dec 9, 2021

Normative: Add RegExp v flag with set notation and properties of strings #2418

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

jmdyck commented Sep 23, 2021 •

edited

Loading

bakkot Sep 23, 2021

jmdyck Sep 23, 2021 •

edited

Loading

bakkot left a comment

jmdyck commented Sep 29, 2021

michaelficarra left a comment

bakkot commented Oct 21, 2021

ljharb commented Oct 21, 2021

jmdyck commented Oct 21, 2021

michaelficarra commented Oct 22, 2021

jmdyck commented Oct 22, 2021

Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

Conversation

jmdyck commented Sep 23, 2021 • edited Loading

bakkot Sep 23, 2021

Choose a reason for hiding this comment

jmdyck Sep 23, 2021 • edited Loading

Choose a reason for hiding this comment

bakkot left a comment

Choose a reason for hiding this comment

jmdyck commented Sep 29, 2021

michaelficarra left a comment

Choose a reason for hiding this comment

bakkot commented Oct 21, 2021

ljharb commented Oct 21, 2021

jmdyck commented Oct 21, 2021

michaelficarra commented Oct 22, 2021

jmdyck commented Oct 22, 2021

jmdyck commented Sep 23, 2021 •

edited

Loading

jmdyck Sep 23, 2021 •

edited

Loading