-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Re-cast Pattern "evaluation" rules as 8 conventional SDOs #2531
Conversation
spec.html
Outdated
<p>This section is amended in <emu-xref href="#sec-compilesubpattern-annexb"></emu-xref>.</p> | ||
</emu-note> | ||
|
||
<!-- Disjunction --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to leave these comments in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. In the status quo, each nonterminal gets its own clause, but in this PR, three of the new SDOs (CompileSubpattern, CompileAtom, and CompileToCharSet) include rules for multiple nonterminals, and I wasn't sure if the loss of a per-nonterminal heading would be 'disorienting', so I left these comments in case something was wanted.
Looking at the rendered spec, I don't really feel the lack of per-nonterminal headings, but I can imagine that some people might like to have them added. Looking at the source, I think the comments help navigation slightly, but I wouldn't be crushed if the editors wanted them removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I had a couple small wordsmithing things, but otherwise LGTM.
Force-pushed to add two fixup commits, incorporating @bakkot's suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but can we switch out the 1
or -1
direction thing with a spec enum like ~forward~
and ~backward~
?
I'm in favor of that change, but it seems unrelated to this PR. We can land it in a followup. |
@jmdyck do you want to rebase this and condense it into a smaller commit list, before i merge it? |
I've force-pushed to rebase to master and merge in the two fixup commits. (But it's weird: GitHub has the update in my Pattern_ops branch, but it's not showing up in this PR. Maybe just a delay?) I'm fine with the resulting set of commits, but editors, let me know if you'd like a smaller set. |
It's a bit granular, but that's okay. I rarely care to have anything finer than a whole PR in the git history, personally. |
... at its only use, in the evaluation rule for `AtomEscape :: CharacterEscape`. This is easy because the |CharacterEscape| evaluation algorithm doesn't reference any substructure of the |CharacterEscape|. Annex B's `CharacterEscape :: LegacyOctalEscapeSequence` production has the same evaluation rule, and so can be discarded also. (Although |CharacterEscape| also occurs in `ClassEscape :: CharacterEscape`, the latter's evaluation rule doesn't evaluate |CharacterEscape|, so is unaffected by the removal of the evaluation rule for |CharacterEscape|.)
... at its only use, in the evaluation rule for `AtomEscape :: DecimalEscape`. This is easy because the |DecimalEscape| evaluation algorithm doesn't reference any substructure of the |DecimalEscape|. Also, the <emu-note> in the AtomEscape clause is paraphrasing the evaluation rule for `AtomEscape :: DecimalEscape` production, so move it up to be immediately after that rule. That <emu-note> is roughly a superset of the <emu-note> that accompanied the evaluation rule for |DecimalEscape|, so we're not losing anything by discarding the latter <emu-note>.
…#2531) ... to after those for |ClassEscape|. This is a reasonable spot, because of the production `ClassEscape :: CharacterClassEscape`, but will be of more benefit to a future refactoring.
…lause (tc39#2531) Specifically: - CharacterSetMatcher - Canonicalize - UnicodeMatchProperty - UnicodeMatchPropertyValue - CharacterRange (They need to be "out of the way" for the next series of refactorings.)
Instead of returning "the three results _min_, _max_, and _greedy_", return a Record with [[Min]], [[Max]], and [[Greedy]] fields.
…9#2531) Instead of returning "the two results _min_ and _max_", return a Record with [[Min]] and [[Max]] fields.
…2531) Instead of returning "a CharSet _A_ and a Boolean _invert_", return a Record with [[CharSet]] and [[Invert]] fields.
Take all the regexp evaluation rules for: - ClassRanges - NonemptyClassRanges - NonemptyClassRangesNoDash - ClassAtom - ClassAtomNoDash - ClassEscape - CharacterClassEscape - UnicodePropertyValueExpression (that is, all the rules that return a CharSet) and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for CharacterClass and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Atom and AtomEscape and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for QuantifierPrefix and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Quantifier and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Assertion and re-formulate them as a more conventional SDO.
Take all the regexp evaluation rules for Disjunction, Alternative, and Term, and re-formulate them as a more conventional SDO.
Take the regexp evaluation rule for Pattern, and re-formulate it as a more conventional SDO.
…9#2531) (in CompileSubpattern, CompileAtom, CompileToCharSet, CompileAssertion)
Force-pushed just to add the PR number to commit messages. This time, GitHub propagated the change here. |
After the merge of tc39#2531, `_patternCharacters_` is no longer used/referenced, so remove the steps that define it. Also, add a NOTE after step 13.
After the merge of tc39#2531, `_patternCharacters_` is no longer used/referenced, so remove the steps that define it. Also, add a NOTE after step 13.
This PR's change to step 14 of RegExpInitialize pretty much assumes that you're okay with the points of PR #2391, so it would probably make sense to decide that one first.