Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531

Merged
merged 16 commits into from
Oct 22, 2021

Conversation

jmdyck
Copy link
Collaborator

@jmdyck jmdyck commented Sep 23, 2021

  • phase 1 (4 commits): Simple refactorings that make phase 3 easier.
  • phase 2 (3 commits): Replace multi-value returns with Records. (This is easier to do before phase 3 than during.)
  • phase 3 (9 commits + 2 fixups): Take the (idiosyncratic) rules for "evaluating" Patterns and reformulate them into 8 SDOs that are defined and invoked more conventionally.

This PR's change to step 14 of RegExpInitialize pretty much assumes that you're okay with the points of PR #2391, so it would probably make sense to decide that one first.

spec.html Outdated
<p>This section is amended in <emu-xref href="#sec-compilesubpattern-annexb"></emu-xref>.</p>
</emu-note>

<!-- Disjunction -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave these comments in?

Copy link
Collaborator Author

@jmdyck jmdyck Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. In the status quo, each nonterminal gets its own clause, but in this PR, three of the new SDOs (CompileSubpattern, CompileAtom, and CompileToCharSet) include rules for multiple nonterminals, and I wasn't sure if the loss of a per-nonterminal heading would be 'disorienting', so I left these comments in case something was wanted.

Looking at the rendered spec, I don't really feel the lack of per-nonterminal headings, but I can imagine that some people might like to have them added. Looking at the source, I think the comments help navigation slightly, but I wouldn't be crushed if the editors wanted them removed.

Copy link
Contributor

@bakkot bakkot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I had a couple small wordsmithing things, but otherwise LGTM.

spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
@jmdyck
Copy link
Collaborator Author

jmdyck commented Sep 29, 2021

Force-pushed to add two fixup commits, incorporating @bakkot's suggestions.

Copy link
Member

@michaelficarra michaelficarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we switch out the 1 or -1 direction thing with a spec enum like ~forward~ and ~backward~?

@bakkot bakkot added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Oct 21, 2021
@bakkot
Copy link
Contributor

bakkot commented Oct 21, 2021

I'm in favor of that change, but it seems unrelated to this PR. We can land it in a followup.

@ljharb
Copy link
Member

ljharb commented Oct 21, 2021

@jmdyck do you want to rebase this and condense it into a smaller commit list, before i merge it?

@jmdyck
Copy link
Collaborator Author

jmdyck commented Oct 21, 2021

I've force-pushed to rebase to master and merge in the two fixup commits. (But it's weird: GitHub has the update in my Pattern_ops branch, but it's not showing up in this PR. Maybe just a delay?)

I'm fine with the resulting set of commits, but editors, let me know if you'd like a smaller set.

@michaelficarra
Copy link
Member

It's a bit granular, but that's okay. I rarely care to have anything finer than a whole PR in the git history, personally.

jmdyck added 16 commits October 21, 2021 22:01
... at its only use, in the evaluation rule for `AtomEscape :: CharacterEscape`.

This is easy because the |CharacterEscape| evaluation algorithm
doesn't reference any substructure of the |CharacterEscape|.

Annex B's `CharacterEscape :: LegacyOctalEscapeSequence` production
has the same evaluation rule, and so can be discarded also.

(Although |CharacterEscape| also occurs in `ClassEscape :: CharacterEscape`,
the latter's evaluation rule doesn't evaluate |CharacterEscape|,
so is unaffected by the removal of the evaluation rule for |CharacterEscape|.)
... at its only use, in the evaluation rule for `AtomEscape :: DecimalEscape`.

This is easy because the |DecimalEscape| evaluation algorithm
doesn't reference any substructure of the |DecimalEscape|.

Also, the <emu-note> in the AtomEscape clause is paraphrasing
the evaluation rule for `AtomEscape :: DecimalEscape` production,
so move it up to be immediately after that rule.

That <emu-note> is roughly a superset of the <emu-note> that accompanied
the evaluation rule for |DecimalEscape|, so we're not losing anything
by discarding the latter <emu-note>.
…#2531)

... to after those for |ClassEscape|.

This is a reasonable spot, because of the production
`ClassEscape :: CharacterClassEscape`, but will be
of more benefit to a future refactoring.
…lause (tc39#2531)

Specifically:
- CharacterSetMatcher
- Canonicalize
- UnicodeMatchProperty
- UnicodeMatchPropertyValue
- CharacterRange

(They need to be "out of the way" for the next series of refactorings.)
Instead of returning "the three results _min_, _max_, and _greedy_",
return a Record with [[Min]], [[Max]], and [[Greedy]] fields.
…9#2531)

Instead of returning "the two results _min_ and _max_",
return a Record with [[Min]] and [[Max]] fields.
…2531)

Instead of returning "a CharSet _A_ and a Boolean _invert_",
return a Record with [[CharSet]] and [[Invert]] fields.
Take all the regexp evaluation rules for:
- ClassRanges
- NonemptyClassRanges
- NonemptyClassRangesNoDash
- ClassAtom
- ClassAtomNoDash
- ClassEscape
- CharacterClassEscape
- UnicodePropertyValueExpression

(that is, all the rules that return a CharSet)
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for CharacterClass
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Atom and AtomEscape
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for QuantifierPrefix
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Quantifier
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Assertion
and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Disjunction, Alternative, and Term,
and re-formulate them as a more conventional SDO.
Take the regexp evaluation rule for Pattern,
and re-formulate it as a more conventional SDO.
…9#2531)

(in CompileSubpattern, CompileAtom, CompileToCharSet, CompileAssertion)
@jmdyck
Copy link
Collaborator Author

jmdyck commented Oct 22, 2021

Force-pushed just to add the PR number to commit messages. This time, GitHub propagated the change here.

@ljharb ljharb merged commit f534fd6 into tc39:master Oct 22, 2021
@jmdyck jmdyck deleted the Pattern_ops branch October 25, 2021 21:11
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Oct 26, 2021
After the merge of tc39#2531, `_patternCharacters_` is no longer
used/referenced, so remove the steps that define it.

Also, add a NOTE after step 13.
ljharb pushed a commit to jmdyck/ecma262 that referenced this pull request Jan 13, 2022
After the merge of tc39#2531, `_patternCharacters_` is no longer
used/referenced, so remove the steps that define it.

Also, add a NOTE after step 13.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial change ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants