-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 348 #410
Issue 348 #410
Conversation
4fab6c1 added the current bench runner script as `benches/run`, and removed the old `run-bench` script. It was later renamed to `bench/run` when `benches` was renamed to `bench` in b217bfe. This patch fixes a few references to the old benchmark runner in the hacking guide as well as a few references to the old directory structure. The cargo plugin syntax in the example is also updated.
The DFA can't produce captures, but is still faster than the Pike VM NFA, so the normal approach to finding capture groups is to look for the entire match with the DFA and then run the NFA on the substring of the input that matched. In cases where the regex in anchored, the match always starts at the beginning of the input, so there is never any point to trying the DFA first. The DFA can still be useful for rejecting inputs which are not in the language of the regular expression, but anchored regex with capture groups are most commonly used in a parsing context, so it seems like a fair trade-off. For a more in depth discussion see github issue rust-lang#348.
526bc6b
to
bc69b03
Compare
@ethanpailes Thanks so much for putting in the work for this optimization! It might take me a bit to review this, but I just wanted to chime in. :-) |
Unfortunatly, I did end up in a place where you have more review work than I would have hoped because of those weird benchmark results, so it makes sense that it might take a little while to review. |
☔ The latest upstream changes (presumably #414) made this pull request unmergeable. Please resolve the merge conflicts. |
I've rolled this PR into #436. Thanks! |
@ethanpailes There should definitely be a way to observe a performance difference on unanchored regexes, but the anchored optimization is definitely a step in the right direction. I don't quite have time to do the benchmark analysis for unanchored regexes though. From briefly looking at your regexes, my gut says that you need to make the match bigger. |
remove regex plugin + rollup + chores This PR: * Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details. * Setup a Cargo workspace for this repo. * Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly. * Name the frequency analysis based memchr search "freqy packed." * Rolls up the other open PRs #401, #410 and #433.
remove regex plugin + rollup + chores This PR: * Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details. * Setup a Cargo workspace for this repo. * Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly. * Name the frequency analysis based memchr search "freqy packed." * Rolls up the other open PRs #401, #410 and #433.
I took a crack at implimenting #348, but ended up running into some squirrly results with the benchmark that I wrote. During some runs the optimization behaved as expected, but for most there was no real corrilation between input size and performance. I suspect I'm missing some benchmarking trick, so an insight would be appreciated. The steps I used to test the benchmark are outlined in a comment above the new benchmarks I added (in
bench/src/misc.rs
).For now I've just done the anchor optimization.
As I was learning about the code base I also noticed a few documentation references that had drifted out of date, so I fixed those.