Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser_json: use JSON as fallback parser instead of Yajl for performance #4813

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Jan 31, 2025

Which issue(s) this PR fixes:
Fixes #

What this PR does / why we need it:
Recently, Ruby's json has incredible performance improvements.
It might be faster than oj gem.
So, I think json is a suitable as fallback.

This is similar with #4759

Here is easily benchmark. (I used same log file in #4759)

  • Before

    • It spent 90.50963662 sec to handle 10 GB file
  • After

    • It spent 74.624230077 sec to handle 10 GB file
  • config

<source>
  @type tail
  path "#{File.expand_path '~/tmp/access*.log'}"
  pos_file "#{File.expand_path '~/tmp/fluentd/access.log.pos'}"
  tag log
  read_from_head true
  <parse>
    @type json
  </parse>
</source>

<match **>
  @type file
  path "#{File.expand_path '~/tmp/fluentd/log'}"
</match>

FYI)

Docs Changes:
fluent/fluentd-docs-gitbook#560

Release Note:

@kenhys
Copy link
Contributor

kenhys commented Jan 31, 2025

we are happy if we could observe micro benchmark result.

@Watson1978
Copy link
Contributor Author

Micro benchmark

benchmark code

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'oj'
  gem 'json'
  gem 'yajl-ruby', require: 'yajl'
end

Benchmark.ips do |x|
  json = '{"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]}'

  x.report('Oj.load') { Oj.load(json) }
  x.report('JSON.load') { JSON.load(json) }
  x.report('Yajl.load') { Yajl.load(json) }

  x.compare!
end

result

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Oj.load    31.383k i/100ms
           JSON.load    35.513k i/100ms
           Yajl.load    13.839k i/100ms
Calculating -------------------------------------
             Oj.load    319.187k (± 1.5%) i/s    (3.13 μs/i) -      1.601M in   5.015522s
           JSON.load    364.950k (± 0.5%) i/s    (2.74 μs/i) -      1.847M in   5.060214s
           Yajl.load    137.015k (± 2.5%) i/s    (7.30 μs/i) -    691.950k in   5.053679s

Comparison:
           JSON.load:   364950.5 i/s
             Oj.load:   319187.1 i/s - 1.14x  slower
           Yajl.load:   137015.2 i/s - 2.66x  slower

@kenhys
Copy link
Contributor

kenhys commented Jan 31, 2025

By the way, which json gem version did you tested with?
Do we need to use specific version of json gem?

@Watson1978
Copy link
Contributor Author

Watson1978 commented Jan 31, 2025

By the way, which json gem version did you tested with? Do we need to use specific version of json gem?

I tested with json v2.9.1 (latest version), it is default version in Ruby 3.4.1
https://github.com/ruby/ruby/blob/master/doc/NEWS/NEWS-3.4.0.md

If we will use Ruby 3.4 or later in the next fluent-package LTS, hmmm, I think we don't need to specify the version...

@Watson1978 Watson1978 force-pushed the parser_json branch 2 times, most recently from 9968af9 to bfc027e Compare February 3, 2025 00:57
@kenhys
Copy link
Contributor

kenhys commented Feb 3, 2025

NOTE: will not be backported to v1.16

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!!
Waiting CI.

@daipom daipom added this to the v1.19.0 milestone Feb 3, 2025
@daipom daipom merged commit 1d5b0de into fluent:master Feb 3, 2025
10 checks passed
@Watson1978 Watson1978 deleted the parser_json branch February 3, 2025 08:25
@Watson1978
Copy link
Contributor Author

I created separated PR in #4817

daipom pushed a commit that referenced this pull request Feb 4, 2025
**Which issue(s) this PR fixes**: 
Fixes #

**What this PR does / why we need it**: 
Ref. #4813 (comment)
Fix wrong usage of `JSON.load` method.
We should use `JSON.parse` instead.

JSON.load performs unnecessary deserialisation.

```
irb(main):001> JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }')
=> "Hello"
irb(main):002> JSON.parse('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }')
=> {"json_class" => "String", "raw" => [72, 101, 108, 108, 111]}
```


**Docs Changes**:

**Release Note**:

Signed-off-by: Shizuo Fujita <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants