Reference for stage 3 regex-escaping #36928

Josh-Cena · 2024-11-22T19:05:44Z

Part of #36915

github-actions · 2024-11-22T19:09:16Z

Preview URLs

Flaws (1)

Note! 4 documents with no flaws that don't need to be listed. 🎉

URL: /en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape
Title: RegExp.escape()
Flaw count: 1

bad_bcd_queries:
- No BCD data for query: javascript.builtins.RegExp.escape

External URLs (1)

URL: /en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape
Title: RegExp.escape()

https://github.com/zloirock/core-js (1 time) (Note! This may be a new URL 👀)

(comment last updated: 2024-11-27 06:19:49)

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

…ape/index.md

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

hamishwillee · 2024-11-25T01:37:25Z

@Josh-Cena Looks great. I don't fully understand this gist linked from the proposal. But I take it to mean there might be risky places to insert the literal, even escaped. I assume this is to an issue because you would have mentioned it.

As an aside, you link to the escape sequence https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences
The proposal escapes space in its example - I don't see space mentioned in the list. Might not be important - just noting in case it is an omission.

hamishwillee · 2024-11-25T01:42:34Z

files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md

@@ -41,7 +41,7 @@ A new string, with all matches of a pattern replaced by a replacement.

 This method does not mutate the string value it's called on. It returns a new string.

-Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics.
+Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics. (You can also use {{jsxref("RegExp.escape()")}} to make the replacement string a literal pattern, but that is more indirection than just calling `replaceAll()`.)


I'm not sure what you're trying to say with ", but that is more indirection than just calling replaceAll().)"

It reads as though you are not recommending using RegExp.escape() for this case of the string for the regexp constructor.

Aye, I'm not recommending RegExp.escape. I'm just preempting someone saying, "Buuuut you can actually use replace() + RegExp(), you just need to escape it!!"

Perhaps it is actually the "This is especially useful if" part that is confusing me - why is being able to replace all occurrences of a string especially useful if the string is not statically known?

Or to put it another way, don't I have to escape the pattern I use in this method too?

why is being able to replace all occurrences of a string especially useful if the string is not statically known?

I mean, that doesn't sound like an obscure use case at all, is it? Maybe the replacement string comes from user configuration, maybe it's retrieved from a database...

don't I have to escape the pattern I use in this method too?

In replaceAll? No, since the idea is that you can just do replaceAll(oldString, newString). The problem with replace(oldString, newString) is that it only does the replacement once, so if you want to replace all occurrences of oldString, you have to first convert it to a global regex, but the conversion is non-trivial.

I understand why you wouldn't use replace() and I understand why you would use replaceAll for a static string.

But replaceAll takes a pattern that can be a regexp,

I assume by "static string" you just mean "string"? Since it doesn't matter where the string comes from. For regexp there's a difference between "static regexp" and "dynamically assembled regexp" though since the latter is subject to regex syntax injection. String replacement is safe. Regex replacement should preferably only be used with static regexes. It happens that replace() and replaceAll(), when taking a regex, have exactly the same behavior (and replaceAll() throws if the regex is non-global), so yeah you could say that using replaceAll with a regex is useless since it's always equivalent to a replace anyway.

The point is, there are three ways to globally replace a string, in the order when they became available:

replace(new RegExp(oldS, "g"), newS); // Unsafe replaceAll(oldS, newS); // Safe replace(new RegExp(RegExp.escape(oldS), "g"), newS); // Safe, too long

Co-authored-by: Hamish Willee <[email protected]>

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

…ape/index.md

Josh-Cena · 2024-11-26T23:05:52Z

Oh I forgot to ask but @ljharb @bakkot your reviews will be appreciated too. Thanks!

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

ljharb · 2024-11-26T23:50:41Z

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

+
+## See also
+
+- [Polyfill of `RegExp.escape` in `core-js`](https://github.com/zloirock/core-js#regexp-escaping)


are you still only linking to core-js polyfills? if not, it'd be great to include https://www.npmjs.com/package/regexp.escape

I personally would not object, but I would need to see what others say. In the meantime let's hold the same policy that other maintainers have found acceptable, which is to only consistently include core-js polyfills. Hope you would understand.

Sure. Where can I go to revisit this discussion? Last time I tried it never went anywhere, and another polyfill maintainer tried recently and it seemed to get overly heated.

I'll discuss it internally.

…ape/index.md Co-authored-by: Kevin Gibbons <[email protected]>

…ape/index.md

bakkot · 2024-11-27T04:13:39Z

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md

+
+A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced:
+
+- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence.


Suggested change

- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence.

- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\1`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence.

It's true that the escapes are also necessary for the contexts after \0 and \c, but understanding why requires knowledge of the arcana of the regexp grammar which no one should have. Whereas \1 is something readers actually should be familiar with.

In other words, I don't want people to read this sentence and discover that \cA or \06 are things they could put in their regexes, even though it is technically accurate that these contexts are among those we considered when designing this feature.

Wait, is \cA deprecated? It's still mentioned in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape. Also for \0, my intention was that "this \0 was actually supposed to be a null character by itself, but now it's inadvertently joined with something else". OTOH, \1 is mentioned in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences as already-deprecated:

\ followed by any other digit character becomes a legacy octal escape sequence, which is forbidden in Unicode-aware mode.

So my hope is that \1 would never appear as a valid regex sequence anyway.

Wait, is \cA deprecated?

Not in the sense of being in annex B, but it's still not something people should use (because no one will know what it means).

now it's inadvertently joined with something else

Yeah but the only way it can be inadvertently joined with something else is to form a legacy octal escape sequence, which people usually shouldn't have to know about.

So my hope is that \1 would never appear as a valid regex sequence anyway.

That wording could use updating. \1 is a perfectly normal thing to appear in a regex as a reference to the first capturing group: /(.)\1/ is a regex which matches any character appearing twice in a row. Similarly \12 in a regex with 12 capturing groups. So you don't need to know about legacy octal escape sequences to understand why you need to escape leading digits.

It only falls back to being a legacy octal escape in the case that there are fewer capturing groups than the number it would represent.

It only falls back to being a legacy octal escape in the case that there are fewer capturing groups than the number it would represent.

Oh. I get what you mean. Forgot about backreferences. Thanks!

Yeah but the only way it can be inadvertently joined with something else is to form a legacy octal escape sequence, which people usually shouldn't have to know about.

No, I meant this:

const pattern = new RegExp(`\\0${userString}`, "gu"); // Intention: search for null character followed by some other user-provided text

If userString starts with a digit, this will be a syntax error (because u mode), or be something else. There's no "intention" to cause bad things to happen, it just doesn't do what you expect it to do.

Reference for stage 3 regex-escaping

557085a

Josh-Cena requested a review from a team as a code owner November 22, 2024 19:05

Josh-Cena requested review from chrisdavidmills and hamishwillee and removed request for a team and chrisdavidmills November 22, 2024 19:05

github-actions bot added Content:JS JavaScript docs size/m [PR only] 51-500 LoC changed labels Nov 22, 2024

zloirock reviewed Nov 23, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Show resolved Hide resolved

Update files/en-us/web/javascript/reference/global_objects/regexp/esc…

a1724ca

…ape/index.md

hamishwillee reviewed Nov 25, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Outdated Show resolved Hide resolved

hamishwillee reviewed Nov 25, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Show resolved Hide resolved

hamishwillee reviewed Nov 25, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Outdated Show resolved Hide resolved

hamishwillee reviewed Nov 25, 2024

View reviewed changes

hamishwillee mentioned this pull request Nov 25, 2024

[JS] Ship RegExp.escape proposal #36915

Open

8 tasks

Apply suggestions from code review

f657890

Co-authored-by: Hamish Willee <[email protected]>

Josh-Cena commented Nov 26, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Outdated Show resolved Hide resolved

Update files/en-us/web/javascript/reference/global_objects/regexp/esc…

e0e5e03

…ape/index.md

bakkot reviewed Nov 26, 2024

View reviewed changes

files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Outdated Show resolved Hide resolved

ljharb reviewed Nov 26, 2024

View reviewed changes

Josh-Cena and others added 3 commits November 26, 2024 19:54

Update files/en-us/web/javascript/reference/global_objects/regexp/esc…

b416287

…ape/index.md Co-authored-by: Kevin Gibbons <[email protected]>

Update index.md

39e2cfa

Update files/en-us/web/javascript/reference/global_objects/regexp/esc…

09f6a73

…ape/index.md

bakkot reviewed Nov 27, 2024

View reviewed changes

Fix wording

dcd50d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference for stage 3 regex-escaping #36928

Reference for stage 3 regex-escaping #36928

Josh-Cena commented Nov 22, 2024

github-actions bot commented Nov 22, 2024 •

edited

Loading

hamishwillee commented Nov 25, 2024 •

edited

Loading

hamishwillee Nov 25, 2024

Josh-Cena Nov 25, 2024

hamishwillee Nov 25, 2024

Josh-Cena Nov 26, 2024

hamishwillee Nov 26, 2024 •

edited

Loading

Josh-Cena Nov 26, 2024 •

edited

Loading

Josh-Cena commented Nov 26, 2024

ljharb Nov 26, 2024

Josh-Cena Nov 27, 2024

ljharb Nov 27, 2024

Josh-Cena Nov 27, 2024

bakkot Nov 27, 2024

Josh-Cena Nov 27, 2024

bakkot Nov 27, 2024

Josh-Cena Nov 27, 2024 •

edited

Loading


		## See also

		- [Polyfill of `RegExp.escape` in `core-js`](https://github.com/zloirock/core-js#regexp-escaping)


		A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced:

		- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence.

Reference for stage 3 regex-escaping #36928

Are you sure you want to change the base?

Reference for stage 3 regex-escaping #36928

Conversation

Josh-Cena commented Nov 22, 2024

github-actions bot commented Nov 22, 2024 • edited Loading

hamishwillee commented Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hamishwillee Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Josh-Cena Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Josh-Cena commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Josh-Cena Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Nov 22, 2024 •

edited

Loading

hamishwillee commented Nov 25, 2024 •

edited

Loading

hamishwillee Nov 26, 2024 •

edited

Loading

Josh-Cena Nov 26, 2024 •

edited

Loading

Josh-Cena Nov 27, 2024 •

edited

Loading