Refactor `JSON::PullParser#consume_number` to use stdlib number parsing #10447

straight-shoota · 2021-02-26T16:29:43Z

The custom parsing algorithm was insufficient and had several issues around range boundaries.
Replacing it with the established conversion methods ensures correctness.

Resolves #10419
Resolves #10920

The custom parsing algorithm was insufficient and had several issues around range boundaries. Replacing it with the established conversion methods ensures correctness.

asterite · 2021-02-26T16:32:17Z

How is performance affected?

src/json/token.cr

straight-shoota · 2021-02-26T18:10:34Z

The focus is correctness, not performance. So I don't have an extensive performance analysis. It doesn't matter if we have a fast algorithm when its results are plain wrong. I'm sure this can be optimized further, but for now I'd just like to fix the bugs in number parsing.

These are the results for a simple test, parsing an array of 100 random ints/floats.

$ ./bm-json-number-old
  ints  73.67k ( 13.57µs) (± 3.63%)   5.5kB/op        fastest
floats  45.46k ( 22.00µs) (± 2.83%)  7.11kB/op   1.62× slower
$ ./bm-json-number-new
  ints  58.82k ( 17.00µs) (± 2.79%)  5.47kB/op        fastest
floats  31.69k ( 31.56µs) (± 2.92%)  7.09kB/op   1.86× slower

kostya · 2021-02-26T18:46:30Z

for this bench: https://github.com/kostya/benchmarks#json
was: 0.661 s
this pr: 0.839 s

asterite · 2021-02-26T19:09:23Z

@straight-shoota I believe correctness can be achieved without sacrificing performance. There's no need to go from a string to an intermediate string to an int or float. It can be parsed directly. Also numbers in JSON are simpler than what we cover in the std (no whitespace, no underscore, etc.)

straight-shoota · 2021-02-26T19:22:43Z

Sure, but this is not trivial. Until somebody implements that correctly, we should fall back to a less performant but correct implementation.

kostya · 2021-02-26T19:41:41Z

what if adapt this parser https://github.com/lemire/fast_double_parser, which is used by simdjson

kostya · 2021-03-04T11:50:03Z

for fun i try this, with this shard: https://github.com/kostya/fast_to_f

with such dirty patch:

class JSON::Lexer::StringBased
  private def consume_number
    buf = @reader.string.to_unsafe + @reader.pos
    v, end_s = FastToF.parse_internal(buf)

    if end_s.null?
      unexpected_char
    else
      if v.round == v
        @token.kind = :int
        @token.int_value = v.to_i64
      else
        @token.kind = :float
        @token.float_value = v
      end

      chars_count = end_s - buf
      @reader.pos = @reader.pos + chars_count
    end
  end
end

my benchmark:
was: 0.655 s
with this: 0.550 s

sdogruyol

I'm on the same page with @straight-shoota here. I think the program should behave correct first, then we can always think about performance.

spec/std/json/pull_parser_spec.cr

straight-shoota · 2021-06-22T18:43:14Z

We'll postpone this to work on the implementation.

straight-shoota · 2021-08-25T08:54:02Z

After reviewing this again, I think we should merge this as is. This patch fixes a number of serious bugs in JSON number parsing. Performance degradation is insignificant over ensuring correctness.
Sure, this can be implemented more efficiently and we should aim to do that. But that's gonna take some more effort. We must not postpone the low hanging fruit of fixing the parser while waiting for a more performant implementation.

beta-ziliani

Given that there's no better proposal yet, I would like this one to get merged. Like @straight-shoota said, better a correct algorithm than a broken one. When we get a new working and fast algorithm, we'll replace this one.

Refactor JSON::PullParser#consume_number to use stdlib number parsing

c653d95

The custom parsing algorithm was insufficient and had several issues around range boundaries. Replacing it with the established conversion methods ensures correctness.

straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:serialization labels Feb 26, 2021

Sija reviewed Feb 26, 2021

View reviewed changes

src/json/token.cr Show resolved Hide resolved

src/json/token.cr Show resolved Hide resolved

sdogruyol approved these changes Mar 7, 2021

View reviewed changes

straight-shoota added this to the 1.1.0 milestone Apr 13, 2021

drhuffman12 reviewed Apr 19, 2021

View reviewed changes

spec/std/json/pull_parser_spec.cr Show resolved Hide resolved

straight-shoota modified the milestones: 1.1.0, 1.2.0 Jun 22, 2021

straight-shoota changed the title ~~Refactor JSON::PullParser#consume_number to use stdlib number parsing~~ Refactor JSON::PullParser#consume_number to use stdlib number parsing Jun 29, 2021

HertzDevil mentioned this pull request Jul 11, 2021

JSON parse gives wrong Float64 result #10920

Closed

straight-shoota added this to the 1.2.0 milestone Aug 25, 2021

beta-ziliani approved these changes Sep 8, 2021

View reviewed changes

straight-shoota merged commit c548273 into crystal-lang:master Sep 9, 2021

straight-shoota deleted the fix/json-consume_number branch September 9, 2021 20:43

franciscoadasme mentioned this pull request Oct 16, 2021

Use FastToFloat to improve float parsing performance franciscoadasme/chem.cr#134

Open

asterite mentioned this pull request Mar 16, 2024

Optimize JSON parsing a bit #14366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `JSON::PullParser#consume_number` to use stdlib number parsing #10447

Refactor `JSON::PullParser#consume_number` to use stdlib number parsing #10447

straight-shoota commented Feb 26, 2021 •

edited

Loading

asterite commented Feb 26, 2021

straight-shoota commented Feb 26, 2021

kostya commented Feb 26, 2021

asterite commented Feb 26, 2021

straight-shoota commented Feb 26, 2021

kostya commented Feb 26, 2021

kostya commented Mar 4, 2021

sdogruyol left a comment

straight-shoota commented Jun 22, 2021

straight-shoota commented Aug 25, 2021

beta-ziliani left a comment

Refactor JSON::PullParser#consume_number to use stdlib number parsing #10447

Refactor JSON::PullParser#consume_number to use stdlib number parsing #10447

Conversation

straight-shoota commented Feb 26, 2021 • edited Loading

asterite commented Feb 26, 2021

straight-shoota commented Feb 26, 2021

kostya commented Feb 26, 2021

asterite commented Feb 26, 2021

straight-shoota commented Feb 26, 2021

kostya commented Feb 26, 2021

kostya commented Mar 4, 2021

sdogruyol left a comment

Choose a reason for hiding this comment

straight-shoota commented Jun 22, 2021

straight-shoota commented Aug 25, 2021

beta-ziliani left a comment

Choose a reason for hiding this comment

Refactor `JSON::PullParser#consume_number` to use stdlib number parsing #10447

Refactor `JSON::PullParser#consume_number` to use stdlib number parsing #10447

straight-shoota commented Feb 26, 2021 •

edited

Loading