Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 348 #410

Closed
wants to merge 3 commits into from
Closed

Issue 348 #410

wants to merge 3 commits into from

Conversation

ethanpailes
Copy link
Contributor

I took a crack at implimenting #348, but ended up running into some squirrly results with the benchmark that I wrote. During some runs the optimization behaved as expected, but for most there was no real corrilation between input size and performance. I suspect I'm missing some benchmarking trick, so an insight would be appreciated. The steps I used to test the benchmark are outlined in a comment above the new benchmarks I added (in bench/src/misc.rs).

For now I've just done the anchor optimization.

As I was learning about the code base I also noticed a few documentation references that had drifted out of date, so I fixed those.

Ethan Pailes added 2 commits October 27, 2017 10:58
4fab6c1 added the current bench
runner script as `benches/run`, and removed the old `run-bench`
script. It was later renamed to `bench/run` when `benches` was
renamed to `bench` in b217bfe.
This patch fixes a few references to the old benchmark runner
in the hacking guide as well as a few references to the old
directory structure. The cargo plugin syntax in the example
is also updated.
The DFA can't produce captures, but is still faster
than the Pike VM NFA, so the normal approach to finding
capture groups is to look for the entire match with the
DFA and then run the NFA on the substring of the input
that matched. In cases where the regex in anchored, the
match always starts at the beginning of the input, so
there is never any point to trying the DFA first.

The DFA can still be useful for rejecting inputs which
are not in the language of the regular expression, but
anchored regex with capture groups are most commonly
used in a parsing context, so it seems like a fair trade-off.

For a more in depth discussion see github issue rust-lang#348.
@ethanpailes ethanpailes force-pushed the issue-348 branch 2 times, most recently from 526bc6b to bc69b03 Compare October 28, 2017 04:57
@BurntSushi
Copy link
Member

@ethanpailes Thanks so much for putting in the work for this optimization! It might take me a bit to review this, but I just wanted to chime in. :-)

@ethanpailes
Copy link
Contributor Author

Unfortunatly, I did end up in a place where you have more review work than I would have hoped because of those weird benchmark results, so it makes sense that it might take a little while to review.

@bors
Copy link
Contributor

bors commented Nov 22, 2017

☔ The latest upstream changes (presumably #414) made this pull request unmergeable. Please resolve the merge conflicts.

@BurntSushi
Copy link
Member

I've rolled this PR into #436. Thanks!

@BurntSushi BurntSushi closed this Dec 30, 2017
@BurntSushi
Copy link
Member

@ethanpailes There should definitely be a way to observe a performance difference on unanchored regexes, but the anchored optimization is definitely a step in the right direction. I don't quite have time to do the benchmark analysis for unanchored regexes though. From briefly looking at your regexes, my gut says that you need to make the match bigger.

bors added a commit that referenced this pull request Dec 30, 2017
remove regex plugin + rollup + chores

This PR:

* Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details.
* Setup a Cargo workspace for this repo.
* Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly.
* Name the frequency analysis based memchr search "freqy packed."
* Rolls up the other open PRs #401, #410 and #433.
bors added a commit that referenced this pull request Dec 30, 2017
remove regex plugin + rollup + chores

This PR:

* Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details.
* Setup a Cargo workspace for this repo.
* Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly.
* Name the frequency analysis based memchr search "freqy packed."
* Rolls up the other open PRs #401, #410 and #433.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants