compatibility for JuliaLang/julia#19449 #180

stevengj · 2017-01-04T02:39:53Z

In addition to removing some string.data constructs that will break after JuliaLang/julia#19449, I also removed a little pre-0.4 code (now obsolete since we REQUIRE 0.4).

…e-0.4 code

TotalVerb

After JuliaLang/julia#19449, I suppose strings will be fast enough that we could consider using them directly instead of working with vectors of bytes? Perhaps that should wait until a few versions in the future.

stevengj · 2017-01-04T03:39:00Z

If you call the codeunit function, it should be as fast as accessing bytes, but less convenient. Iterating over Unicode characters will always be slower that iterating over bytes, I expect.

stevengj · 2017-01-04T03:39:37Z

(I don't have commit access, so someone else will have to click merge.)

stevengj · 2017-01-04T03:43:38Z

src/specialized.jl

@@ -58,7 +58,7 @@ function predict_string(ps::MemoryParserState)
            if ps.utf8data[s] == LATIN_U  # Unicode escape
                t = ps.s
                ps.s = s + 1
-                len += length(string(read_unicode_escape!(ps)).data)
+                len += sizeof(string(read_unicode_escape!(ps)))


This could certainly be made faster, since it is a bit silly to allocate a string just to compute its size. e.g.

utf8size(c::Char) = utf8size(UInt32(c)) utf8size(c::UInt32) = c < 0x80 ? 1 : c < 0x800 ? 2 : c < 0x10000 ? 3 : c < 0x110000 ? 4 : 3

The performance of Unicode escapes is a bit of a marginal use case, but you're right. It would be nice to have a package with functions like these for working with UTF8. (this also applies to the Char-encoding logic)

I filed #181 to keep track of this issue.

stevengj · 2017-01-04T03:46:14Z

src/specialized.jl

@@ -97,7 +97,7 @@ function parse_string(ps::MemoryParserState, b)
            c = ps.utf8data[s]
            if c == LATIN_U  # Unicode escape
                ps.s = s + 1
-                for c in string(read_unicode_escape!(ps)).data
+                for c in Vector{UInt8}(string(read_unicode_escape!(ps)))


could use the new codeunit function here to avoid allocating a Vector{UInt8} object. Or we could re-implement the Char-encoding logic in order to avoid allocating a String at all.

We'll have to wait for codeunit to make its way into Compat first.

compatibility for JuliaLang/julia#19449, and removed some obsolete pr…

ed49433

…e-0.4 code

TotalVerb approved these changes Jan 4, 2017

View reviewed changes

TotalVerb merged commit 416cfbf into JuliaIO:master Jan 4, 2017

stevengj commented Jan 4, 2017

View reviewed changes

stevengj deleted the newstring branch January 4, 2017 03:46

TotalVerb mentioned this pull request Jan 4, 2017

Allocating new strings for UTF8 size and bytes is silly #181

Closed

stevengj mentioned this pull request Jan 8, 2017

ERROR: type String has no field data #183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatibility for JuliaLang/julia#19449 #180

compatibility for JuliaLang/julia#19449 #180

stevengj commented Jan 4, 2017

TotalVerb left a comment •

edited

Loading

stevengj commented Jan 4, 2017

stevengj commented Jan 4, 2017

stevengj Jan 4, 2017 •

edited

Loading

TotalVerb Jan 4, 2017

TotalVerb Jan 4, 2017

stevengj Jan 4, 2017

TotalVerb Jan 4, 2017

compatibility for JuliaLang/julia#19449 #180

compatibility for JuliaLang/julia#19449 #180

Conversation

stevengj commented Jan 4, 2017

TotalVerb left a comment • edited Loading

Choose a reason for hiding this comment

stevengj commented Jan 4, 2017

stevengj commented Jan 4, 2017

stevengj Jan 4, 2017 • edited Loading

Choose a reason for hiding this comment

TotalVerb Jan 4, 2017

Choose a reason for hiding this comment

TotalVerb Jan 4, 2017

Choose a reason for hiding this comment

stevengj Jan 4, 2017

Choose a reason for hiding this comment

TotalVerb Jan 4, 2017

Choose a reason for hiding this comment

TotalVerb left a comment •

edited

Loading

stevengj Jan 4, 2017 •

edited

Loading