-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove regex plugin + rollup + chores #436
Conversation
These are dev dependencies, so we don't need to worry about the minimum Rust version supported.
The latest update to rand requires a newer version of Rust. Since it's a dev dependency, we shouldn't need to do a semver bump when updating rand. However, CI needs to be told not to run tests. Instead, we merely check that we can build the crate and produce documentation.
The regex_macros crate hasn't been maintained in quite some time, and has been broken. Nobody has complained. Given the fact that there are no immediate plans to improve the situation, and the fact that it is slower than the runtime engine, we simply remove it.
The 0.2.1 release of simd includes a fix so that it can compile on the latest nightly. We needn't worry about semver here because simd is a nightly-only dependency.
fbba4e4
to
2291ebf
Compare
There are a few sub-crates in this repository, so sharing a target directory makes sense.
This updates dependencies and makes sure everything compiles and runs. This also simplifies the build script.
Principally, this updates docopt to 0.8, which replaces rustc-serialize with serde.
This commit tweaks the heuristic employed to determine whether to use TBM or not. For the most part, the heuristic was tweaked by combining the actual benchmark results with a bit of hand waving. In particular, the primary change here is that the frequency rank cutoff is no longer a constant, but rather, a function of the pattern length. That is, we guess that TBM will do well with longer patterns, even if it contains somewhat infrequent bytes. We do put a constant cap on this heuristic. That is, regardless of the length of the pattern, if a "very rare" byte is found in the pattern, then we won't use TBM.
As far as I can tell, nobody has actually described a substring search algorithm that used both frequency analysis and vector instructions. So I'm naming it.
4fab6c added the current bench runner script as `benches/run`, and removed the old `run-bench` script. It was later renamed to `bench/run` when `benches` was renamed to `bench` in b217bf. This patch fixes a few references to the old benchmark runner in the hacking guide as well as a few references to the old directory structure. The cargo plugin syntax in the example is also updated.
The DFA can't produce captures, but is still faster than the Pike VM NFA, so the normal approach to finding capture groups is to look for the entire match with the DFA and then run the NFA on the substring of the input that matched. In cases where the regex in anchored, the match always starts at the beginning of the input, so there is never any point to trying the DFA first. The DFA can still be useful for rejecting inputs which are not in the language of the regular expression, but anchored regex with capture groups are most commonly used in a parsing context, so it seems like a fair trade-off. Fixes #348
2291ebf
to
4152e18
Compare
cc @ethanpailes Note that in commit 392b3d6 I tweaked the TBM heuristic a little bit. |
@bors r+ |
📌 Commit 4152e18 has been approved by |
remove regex plugin + rollup + chores This PR: * Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details. * Setup a Cargo workspace for this repo. * Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly. * Name the frequency analysis based memchr search "freqy packed." * Rolls up the other open PRs #401, #410 and #433.
4152e18
to
5ea594e
Compare
@bors r+ |
📌 Commit 5ea594e has been approved by |
remove regex plugin + rollup + chores This PR: * Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details. * Setup a Cargo workspace for this repo. * Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly. * Name the frequency analysis based memchr search "freqy packed." * Rolls up the other open PRs #401, #410 and #433.
@bors r- |
💔 Test failed - status-travis |
@BurntSushi, I know I'm a bit late to the party, but that new heuristic looks great. Thanks for doing the |
This PR:
0.2.1
, which fixes a build failure on Rust nightly.