Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAG review of the proposal to use the RegExp v flag instead of u for the HTML pattern attribute #832

Closed
1 task done
mathiasbynens opened this issue Apr 5, 2023 · 4 comments
Assignees
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Topic: HTML Venue: WHATWG

Comments

@mathiasbynens
Copy link

mathiasbynens commented Apr 5, 2023

こんにちは TAG-さん!

I'm requesting a TAG review of the proposal to use the RegExp v flag instead of u for the HTML pattern attribute.

Summary

Using the new RegExp v flag instead of u would enable the use of set notation, string literal syntax, and Unicode properties of strings within pattern attribute values.

Explainer

This proposal makes the pattern attribute more powerful, enabling the use of set notation, string literal syntax, and Unicode properties of strings.

Differences with the previous u flag-based behavior:

  • [FEATURE] Previously invalid patterns now become valid, e.g.

    pattern="[\p{ASCII_Hex_Digit}--[Ff]]"
    pattern="\p{RGI_Emoji}"
    pattern="[_\q{a|bc|def}]"
    

    For more useful examples of the RegExp v flag, see the relevant feature explainer.

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

    pattern="[(]"
    pattern="[)]"
    pattern="[[]"
    pattern="[{]"
    pattern="[}]"
    pattern="[/]"
    pattern="[-]"
    pattern="[|]"
    pattern="[&&]"
    pattern="[!!]"
    pattern="[##]"
    pattern="[$$]"
    pattern="[%%]"
    pattern="[**]"
    pattern="[++]"
    pattern="[,,]"
    pattern="[..]"
    pattern="[::]"
    pattern="[;;]"
    pattern="[<<]"
    pattern="[==]"
    pattern="[>>]"
    pattern="[??]"
    pattern="[@@]"
    pattern="[``]"
    pattern="[~~]"
    pattern="[_^^]"
    

    Throwing patterns result in inputElement.validity.valid === true for any input value, so the only compatibility risk is that some value/pattern combinations that would previously result in inputElement.validity.valid === false now result in inputElement.validity.valid === true.

  • Other previously valid patterns still behave the same. (Other than the abovementioned features, the v flags only differs in behavior from the u flag w.r.t. case-insensitive matching, but the pattern attribute uses case-sensitive matching.)

Note that the breaking changes mostly apply to somewhat esoteric edge cases that can easily be avoided. In the worst case, this could cause previously invalid input to now be considered valid (since throwing patterns result in inputElement.validity.valid === true for any input value, as if the pattern attribute wasn’t there). In other words, the only Web Compat risk is that a website without server-side validation would suddenly allow submission of values that would previously be prevented by the client-side pattern. All currently allowed inputs would still be accepted, just as they did previously.

IMHO making the change is worth it given the powerful new functionality it brings, and the relatively small compatibility risk. This is reminiscent of the discussion in whatwg/html#439 (but in a different direction).

For context, here’s a few pointers w.r.t. when we decided to implicitly enable the u flag for the pattern attribute in the first place:

Checklist

Further details

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: N/A
  • The group where the work on this specification is currently being done: Spec-wise this is a minor change to the existing HTML pattern attribute. The work is in the form of a PR to the WHATWG HTML Standard.
  • Major unresolved issues with or opposition to this specification: So far, all stakeholders seem in favor of the proposal. The one open question is whether this change is Web Compatible. We’ve implemented a Chrome UseCounter giving us the upper bound of potential compat issues. None of the UseCounter hits so far (see analysis starting with comment #11 on the crbug) constitute an issue in practice.
  • This work is being funded by: Google GmbH

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify mathiasbynens

@LeaVerou
Copy link
Member

LeaVerou commented Apr 5, 2023

Hey, could you folks please add an explainer that follows the guidelines in https://tag.w3.org/explainers/ ?
If anything, I found this more informative than the explainer actually linked, so please make sure to include this information.

@LeaVerou
Copy link
Member

LeaVerou commented Apr 5, 2023

From the same link:

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

Do you have any sense of how common these patterns are in the wild?

@mathiasbynens
Copy link
Author

mathiasbynens commented Apr 5, 2023

Hey, could you folks please add an explainer that follows the guidelines in https://tag.w3.org/explainers/ ? If anything, I found this more informative than the explainer actually linked, so please make sure to include this information.

Updated. PTAL

From the same link:

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

Do you have any sense of how common these patterns are in the wild?

So far the only examples we’ve seen are cases where an unescaped -, {, |, } occurs within a character class. - is the most common occurrence (although our sample size is small). Interestingly, all the cases are the “username” field for a login form, where the username is really an email address, yet instead of using type=email the authors chose to use a pattern. See the link under “Major unresolved issues with or opposition to this specification” for some more details on each case.

@LeaVerou LeaVerou added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Venue: WHATWG Topic: HTML and removed Progress: untriaged labels Apr 19, 2023
@LeaVerou
Copy link
Member

Hi @mathiasbynens,

@ylafon and I looked at this during a breakout today. While we were slightly concerned initially about the backwards compat implications, we do think the benefits v brings far outweigh these concerns, and we are happy to see this go forwards.

Thank you for flying TAG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Topic: HTML Venue: WHATWG
Projects
None yet
Development

No branches or pull requests

2 participants