Skip to content

Commit

Permalink
patch legacy katakana middle dot
Browse files Browse the repository at this point in the history
closes #2

Unicode 4.1 through Unicode 15 omitted these two characters from ID_Continue
by accident. However, this accident was corrected in Unicode 15.1. Any JS VM
that supports ES6+ but that uses a version of Unicode earlier than 15.1 will
consider these to be a syntax error, so we deliberately omit these characters
from the set of identifiers that are valid in both ES5 and ES6+. For more info
see 2.2 in https://www.unicode.org/L2/L2023/23160-utc176-properties-recs.pdf
  • Loading branch information
Boshen committed Jul 8, 2024
1 parent bb62652 commit 2191e35
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 2 deletions.
10 changes: 10 additions & 0 deletions generate/src/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ pub fn parse_id_properties(ucd_dir: &Path) -> Properties {
set.extend(lo..=hi);
}

// <https://github.com/evanw/esbuild/pull/3424>
// Unicode 4.1 through Unicode 15 omitted these two characters from ID_Continue
// by accident. However, this accident was corrected in Unicode 15.1. Any JS VM
// that supports ES6+ but that uses a version of Unicode earlier than 15.1 will
// consider these to be a syntax error, so we deliberately omit these characters
// from the set of identifiers that are valid in both ES5 and ES6+. For more info
// see 2.2 in https://www.unicode.org/L2/L2023/23160-utc176-properties-recs.pdf
properties.id_continue.remove(&0x30FB);
properties.id_continue.remove(&0xFF65);

properties
}

Expand Down
4 changes: 2 additions & 2 deletions src/tables.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions tests/compare.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ fn compare_all_implementations() {
let id_continue_roaring = roaring::id_continue_bitmap();

for ch in '\0'..=char::MAX {
// See test legacy_katakana_middle_dot_patch in tests/patch.rs
if matches!(ch, '・' | '・') {
continue;
}
let thought_to_be_start = unicode_id_start::is_id_start(ch);
let thought_to_be_continue = unicode_id_start::is_id_continue(ch);

Expand Down
11 changes: 11 additions & 0 deletions tests/patch.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
use unicode_id_start::is_id_continue_unicode;

#[test]
fn legacy_katakana_middle_dot_patch() {
// U+30FB KATAKANA MIDDLE DOT
// https://util.unicode.org/UnicodeJsps/character.jsp?a=30FB
assert!(!is_id_continue_unicode('・'));
// U+FF65 HALFWIDTH KATAKANA MIDDLE DOT
// https://util.unicode.org/UnicodeJsps/character.jsp?a=FF65
assert!(!is_id_continue_unicode('・'));
}

0 comments on commit 2191e35

Please sign in to comment.