-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8proc does not correctly handle the 66 Unicode "noncharacters" #34
Comments
According to the Unicode FAQ they should still be category |
Yes, precisely! |
Quickly looking through the code, it seems like the only place this comes up is in Oh, and also in |
I think that's all I'd seen also... will fix as soon as I find that bloody "round tuit"! |
OK, I made the changes, testing them by rebuilding julia and running all the unit tests... but then it turns out that deps/utf8proc/utf8proc.c is not the same as in the JuliaLang/utf8proc repository... lots of simple difference, like UTF8PROC_DLLEXPORT vs. DLLEXPORT. |
Julia doesn't track master of external dependencies, like utf8proc, except when absolutely necessary. I think you can modify your local build to use master (you can probably figure it out from |
I would clone utf8proc into a separate repository before making changes. Editing git submodules is a recipe for trouble if you aren't a git guru. |
@stevengj Thanks, luckily, I'd already done that last night, after going crazy trying to figure out why deps/utf8proc didn't match my ScottPJones/utf8proc fork... |
…rformance and surrogate handling
Fix #34 handle 66 Unicode non-characters and surrogates correctly
utf8proc
considers the 66 Unicode "noncharacters" to not be valid, however the Unicode standard specifically says that they are valid code points, and need to be handled correctly in conforming software.The text was updated successfully, but these errors were encountered: