-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster String
allocation
#19449
faster String
allocation
#19449
Conversation
9b881ff
to
72ffe2d
Compare
d = s.data | ||
i = length(d) | ||
@inbounds while i > 0 && is_valid_continuation(d[i]) | ||
p = pointer(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a gcroot
intrinsic with a semantic similar to ccall
to make this usage gc safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what would be a good api? I'm mainly not sure about how to express the scope a gcroot
applies to.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we should make the scope the whole enclosing function.
data::Array{UInt8,1} | ||
# required to make String("foo") work (#15120): | ||
String(d::Array{UInt8,1}) = new(d) | ||
type String <: AbstractString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume so that the compiler would pass it correctly. (It'll be a isbits
type otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; the presence of the string data is not reflected in the layout of the type, so it would not be passed correctly. Of course that's only the beginning of the problems caused by a hack like this, requiring special cases e.g. in deepcopy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you addressed the aforementioned "problems" including the special deepcopy
implementation?
@@ -57,8 +57,8 @@ function unsafe_string(p::Union{Ptr{UInt8},Ptr{Int8}}) | |||
ccall(:jl_cstr_to_string, Ref{String}, (Ptr{UInt8},), p) | |||
end | |||
|
|||
convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data | |||
convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data | |||
#convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be some mechanism to get a vector of bytes equivalent to a copy of the string after this is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that functionality is still pretty useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can definitely still have that. It might even be possible for an array to share string data in-place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs a special case in the gc push_root
before the pointerfree
case. so that gc_setmark
will see the right size.
Some interesting issues so far:
|
Next issue: this representation does not support |
133d41d
to
d72ffe2
Compare
String
allocationString
allocation
d72ffe2
to
8ae0e85
Compare
This has passed CI and is ready for review. The last commit implements the following scheme for copy-avoidance when converting between strings and vectors:
I'm seeing >=30% speedups for #19141, and the spellcheck benchmark. We need to decide what to do about |
Is there a deprecation mechanism for |
@@ -437,7 +437,7 @@ function splice_buffer!{T<:Integer}(buf::IOBuffer, r::UnitRange{T}, ins::Abstrac | |||
elseif pos > last(r) | |||
seek(buf, pos - length(r)) | |||
end | |||
splice!(buf.data, r + 1, ins.data) # position(), etc, are 0-indexed | |||
splice!(buf.data, r + 1, convert(Vector{UInt8},ins)) # position(), etc, are 0-indexed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more concise if we could just do Vector(ins)
.
@@ -368,8 +372,8 @@ unsafe_write(io::IO, x::Ptr{UInt8}, nb::Int) = | |||
write(io::IO, x::UInt8) = | |||
(ccall(:jl_uv_putb, Void, (Ptr{Void}, UInt8), io_pointer(io), x); 1) | |||
function write(io::IO, x::String) | |||
nb = sizeof(x.data) | |||
unsafe_write(io, ccall(:jl_array_ptr, Ptr{UInt8}, (Any,), x.data), nb) | |||
nb = x.len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably sizeof(x)
is defined and would be cleaner than accessing the .len
field directly?
all_ascii = (D <: UInt8) || (isascii(eol) && | ||
isascii(dlm) && | ||
(!allow_quote || isascii(qchar)) && | ||
(!allow_comments || isascii(cchar))) | ||
if T === String && all_ascii | ||
return dlm_parse(dbuff.data, eol % UInt8, dlm % UInt8, qchar % UInt8, cchar % UInt8, | ||
if all_ascii && !(D <: UInt8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D != UInt8
? Why this additional condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. The whole all_ascii
business at the top of this function can probably be removed.
@@ -446,7 +448,7 @@ end | |||
# check for pure ASCII-ness | |||
|
|||
function ascii(s::String) | |||
for (i, b) in enumerate(s.data) | |||
for (i, b) in enumerate(bytes(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is bytes
? I thought we were now using convert(Vector{UInt8}, s)
or maybe Vector(s)
or Vector{UInt8}(s)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an iterator over bytes that I was trying out. I guess it's not really necessary.
@@ -576,4 +576,4 @@ a starting `crc` integer to be mixed in with the checksum. | |||
""" | |||
function crc32c end | |||
crc32c(a::Array{UInt8}, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, a, sizeof(a)) | |||
crc32c(s::String, crc::UInt32=0x00000000) = crc32c(s.data, crc) | |||
crc32c(s::String, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, s, sizeof(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just use Union{Array{UInt8},String}
for the first argument in a single method?
Needs NEWS and probably doc updates. |
tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object | ||
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype); | ||
JL_GC_PUSH1(&a); | ||
data = jl_gc_alloc_buf(ptls, tot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the how
needs to be initialized to 0
first before calling into the GC again so that the marking code won't try to trace a dangling pointer.
#define jl_buff_tag ((uintptr_t)0x4eade800) | ||
STATIC_INLINE void *jl_gc_alloc_buf(jl_ptls_t ptls, size_t sz) | ||
{ | ||
return jl_gc_alloc(ptls, sz, (void*)jl_buff_tag); | ||
void **buf = (void**)jl_gc_alloc(ptls, sz + sizeof(void*), (void*)jl_buff_tag); | ||
return (void*)(buf + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this mis-aligning the buf
pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose so. I guess we should only do this for elsz==1.
Should accessing the array be a method of convert, reinterpret, or transcode? There was some resistance to using convert between strings and arrays-of-integers during the last string refactoring. |
We've had a conversion from |
Ah, I see; using |
We could define a generic function |
81e4251
to
8ea9828
Compare
Does anybody understand the OSX failure here? |
8ea9828
to
30a7437
Compare
segfaulting with |
That's weird. I can't reproduce that on linux. |
30a7437
to
a72588a
Compare
Looks like it was a stack overflow loading the system image. I'll try a different approach to serializing the new String type. |
a72588a
to
c01a2cf
Compare
OK, that didn't help. |
Should be good to merge once appveyor's happy, but since I think it's idle, let's see if anything recent made much difference: @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Looking good. Away we go! |
arraylist_push(&backref_list, v); | ||
jl_deserialize_struct(s, v, 1); | ||
((jl_typemap_entry_t*)v)->next = (jl_typemap_entry_t*)jl_nothing; | ||
*pn = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing gc_wb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the gc is off here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to create a gc stack but it is still necessary to use write barrier since there can still be old and young objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't a collection have to happen for an object to become old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so if you are sure that the parent is always allocated in the same "deserialization session" it should be enough with a comment about it. Looks like it should be fine here since *pn
is pointing to either the stack or a v
allocated in a previous iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
This replaces the
Array
insideString
with a length field, plus directly asking the GC to allocate space for the string data in-line.This is very incomplete so far, and is only meant to demonstrate the basic idea.The following gist has a hacked-up working version that can be used within an existing build: https://gist.github.com/JeffBezanson/6b7f1785bb7f2509cbd5d4ff1380556dThis makes string allocation ~2x faster, use 1 object instead of 2, and use around half the space for short strings. It's not pretty, but I think we should go with something like this to fix the immediate performance problem. Also removing the
.data
field as soon as possible will make it easier to change the representation in the future if we come up with something nicer.