-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Unicode 14 (to support Vithkuqi) #877
Comments
If this was really added in Unicode 14, then this makes sense since I don't believe the regex crate has had its Unicode tables updated to 14 yet. They're still on 13. No real reason for it. Just hasn't been done yet. |
Also, FWIW, the regex crate does not expose any direct way to access Unicode blocks. But if Unicode's definition of word changed and new casing rules were added, then those will get automatically pulled in by updating to 14. |
I see, thanks for your quick reply. Unicode 14.0 was already released in September 2021, so I think it would be a good idea to update the regex crate accordingly. |
Vithkuqi support was added to Unicode 14. Fixes #877
Vithkuqi support was added to Unicode 14. Fixes #877
Vithkuqi support was added to Unicode 14. Fixes #877
This has been added in |
Awesome @BurntSushi. Thank you very much! |
What version of regex are you using?
1.5.6
Describe the bug at a high level.
The letters of the Vithkuqi script, a script for writing the Albanian language, were added to Unicode version 14.0. The respective Unicode block is from
U+10570
toU+105BF
. I discovered that the regex\w+
does not match the letters of this block. Additionally, case-insensitive regexes starting with(?i)
do not match both Vithkuqi uppercase and lowercase letters.What are the steps to reproduce the behavior?
What is the actual behavior?
The actual output is:
What is the expected behavior?
The expected output is:
The text was updated successfully, but these errors were encountered: