Correct regular expression flags scanning for non-BMP characters #58612

graphemecluster · 2024-05-22T03:53:41Z

Per the IsValidRegularExpressionLiteral static semantics in the ECMA262 specification, FlagText should be interpreted in code points, not code units.

Cherry picked from #58289@e67692a.

typescript-bot · 2024-05-22T03:53:44Z

This PR doesn't have any linked issues. Please open an issue that references this PR. From there we can discuss and prioritise.

(cherry picked from commit e67692a)

tests/cases/compiler/regularExpressionWithNonBMPFlags.ts

rbuckton · 2024-05-31T17:05:19Z

Overall, this looks good. It looks like there are some failing tests that still need to be resolved. Also, since this isn't a bug fix, per se, I expect we will want to hold off from putting this in the 5.5 RC and will wait to merge after main is open for TypeScript 5.6 work.

rbuckton

Switching from "Approve" to "Request Changes" until the test issues are resolved.

graphemecluster · 2024-05-31T18:16:57Z

IMO it is a bug fix especially since I forgot to revert the change on line 2561 in #58289. And it’s important to align the scanner behavior to the spec.
#58615 is a bug fix, too, that is even more critical since the bug breaks real-world codebases.
#58613 is not the case, but (as an excuse) it was supposed to be included in #55600.

rbuckton · 2024-05-31T19:08:53Z

IMO it is a bug fix especially since I forgot to revert the change on line 2561 in #58289. And it’s important to align the scanner behavior to the spec.

What change do you mean? If it's not related to code points then I could see taking a small fix for that. Properly handling non-BMP characters in flags doesn't come up in practice and delaying this specific functionality will only result in a slightly less informative error since none of the valid flag/modifier characters will ever be equivalent to the first byte of a multi-byte code point.

#58615 is a bug fix, too, that is even more critical since the bug breaks real-world codebases. #58613 is not the case, but (as an excuse) it was supposed to be included in #55600.

#58613 can wait till 5.6, but I'll review #58615 shortly.

graphemecluster · 2024-05-31T19:39:25Z

What change do you mean? If it's not related to code points then I could see taking a small fix for that.

This line, which causes the String.fromCharCode below to convert incorrectly (as the method only consider the 16 least significant bits).

rbuckton · 2024-05-31T19:51:02Z

What change do you mean? If it's not related to code points then I could see taking a small fix for that.

This line, which causes the String.fromCharCode below to convert incorrectly (as the method only consider the 16 least significant bits).

I've just put up a minimal fix for that line in #58727.

Use `codePointChecked` instead of `charCodeChecked` in `reScanSlashToken`

typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label May 22, 2024

Correct flags scanning for non-BMP characters

0725a3a

(cherry picked from commit e67692a)

graphemecluster force-pushed the regex-non-bmp-flags branch from ade740e to 0725a3a Compare May 22, 2024 04:16

graphemecluster mentioned this pull request May 23, 2024

Improve Recovery of Unterminated Regular Expressions #58289

Merged

DanielRosenwasser requested a review from rbuckton May 24, 2024 21:09

graphemecluster added 2 commits May 31, 2024 00:57

Merge branch 'main' into regex-non-bmp-flags

f91f83f

Optimization: Lookup by CharacterCodes

406468c

rbuckton requested changes May 31, 2024

View reviewed changes

tests/cases/compiler/regularExpressionWithNonBMPFlags.ts Show resolved Hide resolved

rbuckton approved these changes May 31, 2024

View reviewed changes

rbuckton requested changes May 31, 2024

View reviewed changes

Add remarks to test case file

723b1d6

graphemecluster force-pushed the regex-non-bmp-flags branch from fc3dd58 to 723b1d6 Compare May 31, 2024 18:06

Merge branch 'main' into regex-non-bmp-flags

15783d6

Use `codePointChecked` instead of `charCodeChecked` in `reScanSlashToken`

rbuckton added this to the TypeScript 5.6.0 milestone Jun 4, 2024

rbuckton approved these changes Jun 4, 2024

View reviewed changes

rbuckton merged commit dc1ffb1 into microsoft:main Jun 4, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct regular expression flags scanning for non-BMP characters #58612

Correct regular expression flags scanning for non-BMP characters #58612

graphemecluster commented May 22, 2024 •

edited

Loading

typescript-bot commented May 22, 2024

rbuckton commented May 31, 2024

rbuckton left a comment

graphemecluster commented May 31, 2024

rbuckton commented May 31, 2024 •

edited

Loading

graphemecluster commented May 31, 2024

rbuckton commented May 31, 2024

Correct regular expression flags scanning for non-BMP characters #58612

Correct regular expression flags scanning for non-BMP characters #58612

Conversation

graphemecluster commented May 22, 2024 • edited Loading

typescript-bot commented May 22, 2024

rbuckton commented May 31, 2024

rbuckton left a comment

Choose a reason for hiding this comment

graphemecluster commented May 31, 2024

rbuckton commented May 31, 2024 • edited Loading

graphemecluster commented May 31, 2024

rbuckton commented May 31, 2024

graphemecluster commented May 22, 2024 •

edited

Loading

rbuckton commented May 31, 2024 •

edited

Loading