From ece08bba45ae593744fe3a85d8d56e6cd7ac6c7c Mon Sep 17 00:00:00 2001 From: Mike Dalessio Date: Sun, 28 Aug 2022 13:43:45 -0400 Subject: [PATCH] perf: use `-O2` to compile libgumbo and the nokogiri extension MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit My benchmarks show this generates code that is: - 22% faster at HTML5 serialization - 93% faster at HTML5 parsing Note that `-O3` generates slightly slower HTML5 serialization code (see 8220dc7). Note also that this doesn't change the compiler options used for libxml2 and libxslt (which includes `-O2` already). Using the following benchmark script: #! /usr/bin/env ruby # coding: utf-8 require "bundler/inline" gemfile do source "https://rubygems.org" gem "nokogiri", path: "." gem "benchmark-ips" end require "nokogiri" require "benchmark/ips" input = File.read("test/files/tlm.html") puts "input #{input.length} bytes" html4_doc = Nokogiri::HTML4::Document.parse(input) html5_doc = Nokogiri::HTML5::Document.parse(input) puts RUBY_DESCRIPTION Benchmark.ips do |x| x.time = 10 x.report("html5 parse") do Nokogiri::HTML5::Document.parse(input) end x.report("html4 parse") do Nokogiri::HTML4::Document.parse(input) end x.compare! end Benchmark.ips do |x| x.time = 10 x.report("html5 serialize") do html5_doc.to_html end x.report("html4 serialize") do html4_doc.to_html end x.compare! end with default settings on my dev system (which are `-O3` for extension and unspecified for libgumbo): > input 70095 bytes > ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] > Warming up -------------------------------------- > html5 parse 12.000 i/100ms > html4 parse 31.000 i/100ms > Calculating ------------------------------------- > html5 parse 129.637 (±16.2%) i/s - 1.260k in 10.051475s > html4 parse 355.723 (±21.4%) i/s - 3.441k in 10.104502s > > Comparison: > html4 parse: 355.7 i/s > html5 parse: 129.6 i/s - 2.74x (± 0.00) slower > > Warming up -------------------------------------- > html5 serialize 85.000 i/100ms > html4 serialize 131.000 i/100ms > Calculating ------------------------------------- > html5 serialize 843.993 (± 2.4%) i/s - 8.500k in 10.076902s > html4 serialize 1.319k (± 2.9%) i/s - 13.231k in 10.039827s > > Comparison: > html4 serialize: 1319.0 i/s > html5 serialize: 844.0 i/s - 1.56x (± 0.00) slower after enabling `-O2` on both gumbo and nokogiri source files: > input 70095 bytes > ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] > Warming up -------------------------------------- > html5 parse 21.000 i/100ms > html4 parse 36.000 i/100ms > Calculating ------------------------------------- > html5 parse 250.245 (±20.8%) i/s - 2.394k in 10.066381s > html4 parse 371.905 (±20.2%) i/s - 3.600k in 10.025980s > > Comparison: > html4 parse: 371.9 i/s > html5 parse: 250.2 i/s - same-ish: difference falls within error > > Warming up -------------------------------------- > html5 serialize 101.000 i/100ms > html4 serialize 128.000 i/100ms > Calculating ------------------------------------- > html5 serialize 1.037k (± 3.3%) i/s - 10.403k in 10.042146s > html4 serialize 1.301k (± 4.2%) i/s - 13.056k in 10.055585s > > Comparison: > html4 serialize: 1300.8 i/s > html5 serialize: 1037.2 i/s - 1.25x (± 0.00) slower --- CHANGELOG.md | 1 + ext/nokogiri/extconf.rb | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f1a2e1cc429..802903d7942 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,6 +51,7 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/ ### Improved * Serialization of HTML5 documents and fragments has been re-implemented and is ~10x faster than previous versions. [[#2596](https://github.com/sparklemotion/nokogiri/issues/2596), [#2569](https://github.com/sparklemotion/nokogiri/issues/2569)] +* Parsing of HTML5 documents is ~90% faster thanks to additional compiler optimizations being applied. [[#2639](https://github.com/sparklemotion/nokogiri/issues/2639)] * `Document#canonicalize` now raises an exception if `inclusive_namespaces` is non-nil and the mode is inclusive, i.e. XML_C14N_1_0 or XML_C14N_1_1. `inclusive_namespaces` can only be passed with exclusive modes, and previously this silently failed. * Compare `Encoding` objects rather than compare their names. This is a slight performance improvement and is future-proof. [[#2454](https://github.com/sparklemotion/nokogiri/issues/2454)] (Thanks, [@casperisfine](https://github.com/casperisfine)!) * Avoid compile-time conflict with system-installed `gumbo.h` on OpenBSD. [[#2464](https://github.com/sparklemotion/nokogiri/issues/2464)] diff --git a/ext/nokogiri/extconf.rb b/ext/nokogiri/extconf.rb index ee7c53297a1..3a548a90841 100644 --- a/ext/nokogiri/extconf.rb +++ b/ext/nokogiri/extconf.rb @@ -615,6 +615,9 @@ def do_clean # errors/warnings. see #2302 append_cflags(["-std=c99", "-Wno-declaration-after-statement"]) +# gumbo html5 serialization is slower with O3, let's make sure we use O2 +append_cflags("-O2") + # always include debugging information append_cflags("-g") @@ -956,7 +959,7 @@ def install end def compile - cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-g") + cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-O2", "-g") env = { "CC" => gcc_cmd, "CFLAGS" => cflags } if config_cross_build?