-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatibility for JuliaLang/julia#19449 #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After JuliaLang/julia#19449, I suppose strings will be fast enough that we could consider using them directly instead of working with vectors of bytes? Perhaps that should wait until a few versions in the future.
If you call the |
(I don't have commit access, so someone else will have to click merge.) |
@@ -58,7 +58,7 @@ function predict_string(ps::MemoryParserState) | |||
if ps.utf8data[s] == LATIN_U # Unicode escape | |||
t = ps.s | |||
ps.s = s + 1 | |||
len += length(string(read_unicode_escape!(ps)).data) | |||
len += sizeof(string(read_unicode_escape!(ps))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could certainly be made faster, since it is a bit silly to allocate a string just to compute its size. e.g.
utf8size(c::Char) = utf8size(UInt32(c))
utf8size(c::UInt32) = c < 0x80 ? 1 : c < 0x800 ? 2 : c < 0x10000 ? 3 : c < 0x110000 ? 4 : 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of Unicode escapes is a bit of a marginal use case, but you're right. It would be nice to have a package with functions like these for working with UTF8. (this also applies to the Char-encoding logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #181 to keep track of this issue.
@@ -97,7 +97,7 @@ function parse_string(ps::MemoryParserState, b) | |||
c = ps.utf8data[s] | |||
if c == LATIN_U # Unicode escape | |||
ps.s = s + 1 | |||
for c in string(read_unicode_escape!(ps)).data | |||
for c in Vector{UInt8}(string(read_unicode_escape!(ps))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use the new codeunit
function here to avoid allocating a Vector{UInt8}
object. Or we could re-implement the Char-encoding logic in order to avoid allocating a String
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to wait for codeunit
to make its way into Compat first.
In addition to removing some
string.data
constructs that will break after JuliaLang/julia#19449, I also removed a little pre-0.4 code (now obsolete since we REQUIRE 0.4).