Skip to content

Commit

Permalink
perf: use -O2 to compile libgumbo and the nokogiri extension
Browse files Browse the repository at this point in the history
My benchmarks show this generates code that is:

- 22% faster at HTML5 serialization
- 93% faster at HTML5 parsing

Note that `-O3` generates slightly slower HTML5 serialization
code (see 8220dc7).

Note also that this doesn't change the compiler options used for
libxml2 and libxslt (which includes `-O2` already).

Using the following benchmark script:

    #! /usr/bin/env ruby
    # coding: utf-8

    require "bundler/inline"

    gemfile do
      source "https://rubygems.org"
      gem "nokogiri", path: "."
      gem "benchmark-ips"
    end

    require "nokogiri"
    require "benchmark/ips"

    input = File.read("test/files/tlm.html")
    puts "input #{input.length} bytes"

    html4_doc = Nokogiri::HTML4::Document.parse(input)
    html5_doc = Nokogiri::HTML5::Document.parse(input)

    puts RUBY_DESCRIPTION

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 parse") do
        Nokogiri::HTML5::Document.parse(input)
      end
      x.report("html4 parse") do
        Nokogiri::HTML4::Document.parse(input)
      end
      x.compare!
    end

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 serialize") do
        html5_doc.to_html
      end
      x.report("html4 serialize") do
        html4_doc.to_html
      end
      x.compare!
    end

with default settings on my dev system
(which are `-O3` for extension and unspecified for libgumbo):

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    12.000  i/100ms
>          html4 parse    31.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    129.637  (±16.2%) i/s -      1.260k in  10.051475s
>          html4 parse    355.723  (±21.4%) i/s -      3.441k in  10.104502s
>
> Comparison:
>          html4 parse:      355.7 i/s
>          html5 parse:      129.6 i/s - 2.74x  (± 0.00) slower
>
> Warming up --------------------------------------
>      html5 serialize    85.000  i/100ms
>      html4 serialize   131.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize    843.993  (± 2.4%) i/s -      8.500k in  10.076902s
>      html4 serialize      1.319k (± 2.9%) i/s -     13.231k in  10.039827s
>
> Comparison:
>      html4 serialize:     1319.0 i/s
>      html5 serialize:      844.0 i/s - 1.56x  (± 0.00) slower

after enabling `-O2` on both gumbo and nokogiri source files:

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    21.000  i/100ms
>          html4 parse    36.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    250.245  (±20.8%) i/s -      2.394k in  10.066381s
>          html4 parse    371.905  (±20.2%) i/s -      3.600k in  10.025980s
>
> Comparison:
>          html4 parse:      371.9 i/s
>          html5 parse:      250.2 i/s - same-ish: difference falls within error
>
> Warming up --------------------------------------
>      html5 serialize   101.000  i/100ms
>      html4 serialize   128.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize      1.037k (± 3.3%) i/s -     10.403k in  10.042146s
>      html4 serialize      1.301k (± 4.2%) i/s -     13.056k in  10.055585s
>
> Comparison:
>      html4 serialize:     1300.8 i/s
>      html5 serialize:     1037.2 i/s - 1.25x  (± 0.00) slower
  • Loading branch information
flavorjones committed Aug 28, 2022
1 parent dbb228a commit ece08bb
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/
### Improved

* Serialization of HTML5 documents and fragments has been re-implemented and is ~10x faster than previous versions. [[#2596](https://github.com/sparklemotion/nokogiri/issues/2596), [#2569](https://github.com/sparklemotion/nokogiri/issues/2569)]
* Parsing of HTML5 documents is ~90% faster thanks to additional compiler optimizations being applied. [[#2639](https://github.com/sparklemotion/nokogiri/issues/2639)]
* `Document#canonicalize` now raises an exception if `inclusive_namespaces` is non-nil and the mode is inclusive, i.e. XML_C14N_1_0 or XML_C14N_1_1. `inclusive_namespaces` can only be passed with exclusive modes, and previously this silently failed.
* Compare `Encoding` objects rather than compare their names. This is a slight performance improvement and is future-proof. [[#2454](https://github.com/sparklemotion/nokogiri/issues/2454)] (Thanks, [@casperisfine](https://github.com/casperisfine)!)
* Avoid compile-time conflict with system-installed `gumbo.h` on OpenBSD. [[#2464](https://github.com/sparklemotion/nokogiri/issues/2464)]
Expand Down
5 changes: 4 additions & 1 deletion ext/nokogiri/extconf.rb
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,9 @@ def do_clean
# errors/warnings. see #2302
append_cflags(["-std=c99", "-Wno-declaration-after-statement"])

# gumbo html5 serialization is slower with O3, let's make sure we use O2
append_cflags("-O2")

# always include debugging information
append_cflags("-g")

Expand Down Expand Up @@ -956,7 +959,7 @@ def install
end

def compile
cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-g")
cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-O2", "-g")

env = { "CC" => gcc_cmd, "CFLAGS" => cflags }
if config_cross_build?
Expand Down

0 comments on commit ece08bb

Please sign in to comment.