-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize String#to_json #5456
Optimize String#to_json #5456
Conversation
src/json/builder.cr
Outdated
pos = 0 | ||
reader = Char::Reader.new(string) | ||
|
||
while reader.has_next? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use while char = reader.next_char
(obviously modified to include the first char), eliminating an additional method call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. next_char raises on end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can easily be handled with when Char::ZERO; break
and the reader will never get to a point where it raises. This solution should be faster than calling has_next?
and current_char
on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string can have zero byte in it. so condition not always work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that's a valid character in a JS string.
Still, you could check has_next?
only if char == Char::ZERO
instead of every iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search while reader.has_next?
in std. 7 occurrences. nothing wrong with this code. plus i dont think its slower than your approach. this is idiomatic code.
src/json/builder.cr
Outdated
when '\\' | ||
io << "\\\\" | ||
escape = "\\\\" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escape
could just be a Char
(the one being escaped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io << '\\' << escaped_char
. It would be more efficient than creating and inserting a new string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings are allocated on the heap, chars not. Here it is very easy to avoid heap allocations and use chars instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant strings are not allocated on the heap. They're pre-built in read only ELF data, so doing foo = "bar"
is simply the cost of assigning an address to a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, forgot about that.
Still, using a Char should be slightly preferrable. It's just less data to store: no size field + backslash is only needed once for all escape sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escape cant be char. for \b
want "\b", not '\' and '\b'. escape must be string. or it must be 'b' in case of '\b', but then two writes, slower than one write. but also, nanosecond time. not relevant.
please send a different pr where performance is beat, then i can accept changes. otherwise this pr is good.
src/json/builder.cr
Outdated
string.each_char do |char| | ||
case char | ||
|
||
pos = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a more descriptive name like last_start_pos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too long... maybe start
or start_pos
... theres only one pos to track, why last? theres no first. theres no middle. algorithm is simple.
src/json/builder.cr
Outdated
end | ||
|
||
if escape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to put this behaviour directly in the when
branches (maybe extract in a private method). This adds some code duplication but eliminates an additional condition check.
Or just skip to the next char with next
from the .ascii_control?
and else
branches. Then you can keep it here but don't need the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put directly in when
: longer code.
i understand your concern, but these little details not improve performance, and make code longer, same to understand. so i prefer not change.
src/json/builder.cr
Outdated
io << "\\u" | ||
ord = char.ord | ||
io << '0' if ord < 0x1000 | ||
io << '0' if ord < 0x100 | ||
io << '0' if ord < 0x10 | ||
ord.to_s(16, io) | ||
else | ||
io << char | ||
set_pos = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just insert next
here, eliminating the need for a variable set_pos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, ends up being same longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code might be a bit longer, but I expect execution time to be lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i changed code, same time. so stay the same for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to remove set_pos
personally... it's not a big deal for me but the logic is slightly harder to follow. I want to see what the rest of the core team thinks.
@straight-shoota sorry, but you not core team member, so why do i need apply suggestions? better close this. you want to implement this, you do it. |
@larubujo all reviews are just suggestions, if you disagree with them feel free to not implement them, but I don't see why you'd close this PR. |
Yeah, you don't have to follow my suggestions. I was just giving you hints how I think you could improve your code. If you don't want to apply them, then don't. But I think they're reasonable to improve efficiency even further. |
i reopen. i can rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine to merge if the rest of the core team agrees.
We might need to change the scope of the escaping in the future to escape more characters but that's outside the scope of this PR.
c60de57
to
577d099
Compare
refactored. let me know what you think. |
577d099
to
5eb00cf
Compare
I think I prefer the old version, it has less duplication and less to go wrong. Replacing the conditionals with |
boring... |
@larubujo I meant your old version that you originally PRed, not the version before this PR! The version before you said "refactored. let me know what you think." Sorry for the misunderstanding. |
problem is, i lost that code. cant remember now. plus review is "do this, no, do that, revert this now". really boring and frustrating. better someone continue this, i got tired. |
reopen second time. please accept it. also note this is faster than original code, so better:
slightly faster. but maybe moon closer to earth these days, who knows. |
I think it is good now. However, it could improve the handling of special characters besides performance improvements. But this could be done in a subsequent PR. Namely, it misses named escapes for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the optimization, but since the flow is more complicated than linear iteration of chars I would prefer to have some specs to ensure corner cases of the code won't fail in the future due to further changes. Like: if the string starts with sequences to escape, if it has more than one sequences to escape together, if there is trailing sequences to escape. Those kind of cases might work right now, but some slightly changes could end in chars been lost.
5eb00cf
to
22a66cc
Compare
@bcardiff added more specs |
spec/std/json/serialization_spec.cr
Outdated
@@ -215,6 +215,19 @@ describe "JSON serialization" do | |||
"\u{19}".to_json.should eq("\"\\u0019\"") | |||
end | |||
|
|||
it "does for String with control codes in a few places" do | |||
"\fab".to_json.should eq("\"\\fab\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably much more readable to use %q("\fab")
here instead of "\"\\fab\""
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
22a66cc
to
60bcc37
Compare
saw this: https://stackoverflow.com/questions/47952457/crystal-slow-json-serialization-of-structs-containing-large-strings
this pr still slower then go (crystal memory allocator slower, probably gc slower), but faster then before
old code: go char by char and append it
new code: go char by char, if escape then append all before that point, then escaped char
best case: string has no escape, append all in one go
worse case: all chars escapes, similar to old code
bench:
old:
new: