-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor JSON::PullParser#consume_number
to use stdlib number parsing
#10447
Refactor JSON::PullParser#consume_number
to use stdlib number parsing
#10447
Conversation
The custom parsing algorithm was insufficient and had several issues around range boundaries. Replacing it with the established conversion methods ensures correctness.
How is performance affected? |
The focus is correctness, not performance. So I don't have an extensive performance analysis. It doesn't matter if we have a fast algorithm when its results are plain wrong. I'm sure this can be optimized further, but for now I'd just like to fix the bugs in number parsing. These are the results for a simple test, parsing an array of 100 random ints/floats.
|
for this bench: https://github.com/kostya/benchmarks#json |
@straight-shoota I believe correctness can be achieved without sacrificing performance. There's no need to go from a string to an intermediate string to an int or float. It can be parsed directly. Also numbers in JSON are simpler than what we cover in the std (no whitespace, no underscore, etc.) |
Sure, but this is not trivial. Until somebody implements that correctly, we should fall back to a less performant but correct implementation. |
what if adapt this parser https://github.com/lemire/fast_double_parser, which is used by simdjson |
for fun i try this, with this shard: https://github.com/kostya/fast_to_f with such dirty patch: class JSON::Lexer::StringBased
private def consume_number
buf = @reader.string.to_unsafe + @reader.pos
v, end_s = FastToF.parse_internal(buf)
if end_s.null?
unexpected_char
else
if v.round == v
@token.kind = :int
@token.int_value = v.to_i64
else
@token.kind = :float
@token.float_value = v
end
chars_count = end_s - buf
@reader.pos = @reader.pos + chars_count
end
end
end my benchmark: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the same page with @straight-shoota here. I think the program should behave correct first, then we can always think about performance.
We'll postpone this to work on the implementation. |
JSON::PullParser#consume_number
to use stdlib number parsing
After reviewing this again, I think we should merge this as is. This patch fixes a number of serious bugs in JSON number parsing. Performance degradation is insignificant over ensuring correctness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's no better proposal yet, I would like this one to get merged. Like @straight-shoota said, better a correct algorithm than a broken one. When we get a new working and fast algorithm, we'll replace this one.
The custom parsing algorithm was insufficient and had several issues around range boundaries.
Replacing it with the established conversion methods ensures correctness.
Resolves #10419
Resolves #10920