Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_valid_utf8 (and string literals) don't detected badly encoded surrogate pairs. #11141

Closed
ScottPJones opened this issue May 5, 2015 · 0 comments
Labels
unicode Related to unicode characters and encodings

Comments

@ScottPJones
Copy link
Contributor

Surrogate characters encoded as separate 3-byte UTF-8 characters are not detected,
either when building literals, or when calling the is_valid_utf8 function.

julia> q = utf8("\udbff\udfff")
"\udbff\udfff"

julia> is_valid_utf8(q)
true

julia> is_valid_utf8("\udbff\udfff")
true

julia> is_valid_utf8("\udbffa")
true

julia> is_valid_utf8("\udbffz")
true
@mbauman mbauman added the unicode Related to unicode characters and encodings label May 5, 2015
ScottPJones added a commit to ScottPJones/julia that referenced this issue May 8, 2015
ScottPJones added a commit to ScottPJones/julia that referenced this issue May 8, 2015
ScottPJones added a commit to ScottPJones/julia that referenced this issue May 14, 2015
stevengj added a commit that referenced this issue May 15, 2015
Fix #11141/#10973 and improve performance of is_valid_utf8/is_valid_ascii
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
tkelman pushed a commit to tkelman/julia that referenced this issue Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

2 participants