bugfix: char class casefold for certain chars #20

haozhun · 2015-03-20T19:13:51Z

When a character is less than or equal to single byte size (0xff),
yet it takes more than 1 byte in the current encoding, the
case folding code incorrectly put it in bitset instead of code
range. As a result, for utf8 encoding, casefold works incorrectly
on characters in range \u0080 to \u00ff (latin1 supplement).

Before fix:

"\u00c2" [\u00e0-\u00e5] returns false
"\u00c2" [\u00e2] returns false
"\u00c2" \u00e2 returns true

When a character is less than or equal to single byte size (0xff), yet it takes more than 1 byte in the current encoding, the case folding code incorrectly put it in bitset instead of code range. As a result, for utf8 encoding, casefold works incorrectly on characters in range \u0080 to \u00ff (latin1 supplement). Before fix: * `"\u00c2"` `[\u00e0-\u00e5]` returns false * `"\u00c2"` `[\u00e2]` returns false * `"\u00c2"` `\u00e2` returns true

headius · 2025-01-15T23:44:35Z

Close in favor of rebased #85.

haozhun force-pushed the ic branch from 13fe106 to ad6a090 Compare April 9, 2015 03:26

haozhun force-pushed the ic branch from ad6a090 to 5c804e4 Compare April 20, 2015 20:21

haozhun force-pushed the ic branch from 5c804e4 to c703f2a Compare April 20, 2015 20:23

sebthom mentioned this pull request Jan 10, 2025

fix: char class casefold for certain chars #85

Open

headius closed this Jan 15, 2025

headius added this to the Invalid or Duplicate milestone Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: char class casefold for certain chars #20

bugfix: char class casefold for certain chars #20

haozhun commented Mar 20, 2015

headius commented Jan 15, 2025

bugfix: char class casefold for certain chars #20

bugfix: char class casefold for certain chars #20

Conversation

haozhun commented Mar 20, 2015

headius commented Jan 15, 2025