From bb0bedd25dbb69b247b0894a6c357f8903a2b9a2 Mon Sep 17 00:00:00 2001 From: NAITOH Jun Date: Thu, 19 Dec 2024 11:18:52 +0900 Subject: [PATCH] Optimize `IOSource#read_until` method by using `StringScanner#check_until(string)` (#226) ## Why? `StringScanner#check_until(string)` is faster than `StringScanner#check_until(regex)`. See: - https://github.com/ruby/strscan/pull/106 - https://github.com/ruby/strscan/pull/111 ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 19.459 19.840 35.035 35.786 i/s - 100.000 times in 5.139034s 5.040369s 2.854304s 2.794367s sax 30.057 30.026 52.986 53.716 i/s - 100.000 times in 3.326998s 3.330499s 1.887303s 1.861652s pull 33.777 34.415 62.294 64.020 i/s - 100.000 times in 2.960622s 2.905668s 1.605284s 1.562002s stream 33.789 34.003 60.174 60.411 i/s - 100.000 times in 2.959521s 2.940916s 1.661845s 1.655334s Comparison: dom after(YJIT): 35.8 i/s before(YJIT): 35.0 i/s - 1.02x slower after: 19.8 i/s - 1.80x slower before: 19.5 i/s - 1.84x slower sax after(YJIT): 53.7 i/s before(YJIT): 53.0 i/s - 1.01x slower before: 30.1 i/s - 1.79x slower after: 30.0 i/s - 1.79x slower pull after(YJIT): 64.0 i/s before(YJIT): 62.3 i/s - 1.03x slower after: 34.4 i/s - 1.86x slower before: 33.8 i/s - 1.90x slower stream after(YJIT): 60.4 i/s before(YJIT): 60.2 i/s - 1.00x slower after: 34.0 i/s - 1.78x slower before: 33.8 i/s - 1.79x slower ``` - YJIT=ON : 1.00x - 1.03x faster - YJIT=OFF : 1.00x - 1.02x faster --- lib/rexml/source.rb | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/rexml/source.rb b/lib/rexml/source.rb index b0b89b71..2409f76e 100644 --- a/lib/rexml/source.rb +++ b/lib/rexml/source.rb @@ -68,8 +68,14 @@ module Private SCANNER_RESET_SIZE = 100000 PRE_DEFINED_TERM_PATTERNS = {} pre_defined_terms = ["'", '"', "<"] - pre_defined_terms.each do |term| - PRE_DEFINED_TERM_PATTERNS[term] = /#{Regexp.escape(term)}/ + if StringScanner::Version < "3.1.1" + pre_defined_terms.each do |term| + PRE_DEFINED_TERM_PATTERNS[term] = /#{Regexp.escape(term)}/ + end + else + pre_defined_terms.each do |term| + PRE_DEFINED_TERM_PATTERNS[term] = term + end end end private_constant :Private