From 5011f6e9f1da44ffd923d612e75e70411d63a0ea Mon Sep 17 00:00:00 2001 From: Andrew Gallant Date: Mon, 9 Oct 2023 19:51:44 -0400 Subject: [PATCH] changelog: add perf bug fix for \b MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Like the previous CHANGELOG entry, this marks a bug that was fixed likely with the introduction of regex 1.9: $ hyperfine "rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt" "rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt" Benchmark 1: rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt Time (mean ± σ): 1.034 s ± 0.011 s [User: 1.030 s, System: 0.004 s] Range (min … max): 1.021 s … 1.053 s 10 runs Benchmark 2: rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt Time (mean ± σ): 6.3 ms ± 0.3 ms [User: 4.6 ms, System: 1.6 ms] Range (min … max): 5.6 ms … 7.3 ms 343 runs Summary 'rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt' ran 164.95 ± 7.70 times faster than 'rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt' This was not fixed by making \b itself faster, but rather, by improving inner literal extraction. In particular, if the regex doesn't have any literals extracted, then search time can still be quite slow: $ time rg-13.0.0 -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt 57538 real 0.427 user 0.423 sys 0.003 maxmem 46 MB faults 0 $ time rg -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt 57538 real 0.337 user 0.333 sys 0.003 maxmem 46 MB faults 0 But then again, so is grep, because grep doesn't benefit from any literal optimizations either: $ time grep -E -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt 62396 real 1.316 user 1.292 sys 0.007 maxmem 13 MB faults 7 The count mismatch should probably be investigated. Fixes #1760 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 3c160b9f4..f9180118f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,8 @@ Unreleased changes. Release notes have not yet been written. Performance improvements: +* [PERF #1760](https://github.com/BurntSushi/ripgrep/issues/1760): + Make most searches with `\b` look-arounds (among others) much faster. * [PERF #2591](https://github.com/BurntSushi/ripgrep/pull/2591): Parallel directory traversal now uses work stealing for faster searches.