Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filters: use faster regexp package #5315

Merged
merged 3 commits into from
Feb 7, 2022
Merged

filters: use faster regexp package #5315

merged 3 commits into from
Feb 7, 2022

Conversation

bboreham
Copy link
Contributor

@bboreham bboreham commented Feb 3, 2022

This PR applies some optimisations I have submitted to the Go project:

I have a FOSDEM talk on Saturday 6th Feb about this.

I collected the optimisations into a new repo https://github.com/grafana/regexp; it exists purely to make PRs like this possible and can be retired once the changes land upstream.

I also added some new benchmark cases, because all the existing ones don't actually reach the regexp package after 'simplification'.
The benchmark is now running each case once for each iteration instead of a million times; this makes it easier to control the time and to read the averages.

name                                                           old time/op  new time/op   delta
_LineFilter/simplified_true_foo.*-4                            28.5ns ± 4%   27.8ns ± 4%     ~     (p=0.222 n=5+5)
_LineFilter/simplified_false_foo.*-4                           38.8ns ± 3%   39.0ns ± 5%     ~     (p=1.000 n=5+5)
_LineFilter/simplified_true_(?i)foo-4                           203ns ± 4%    200ns ± 3%     ~     (p=0.881 n=5+5)
_LineFilter/simplified_false_(?i)foo-4                          209ns ± 3%    211ns ± 5%     ~     (p=0.690 n=5+5)
_LineFilter/simplified_true_.*foo.*-4                          27.9ns ± 1%   28.1ns ± 2%     ~     (p=0.746 n=5+5)
_LineFilter/simplified_false_.*foo.*-4                         38.8ns ± 2%   39.0ns ± 3%     ~     (p=1.000 n=5+5)
_LineFilter/simplified_true_.*foo-4                            28.0ns ± 3%   27.9ns ± 2%     ~     (p=1.000 n=5+5)
_LineFilter/simplified_false_.*foo-4                           38.1ns ± 3%   39.0ns ± 2%     ~     (p=0.151 n=5+5)
_LineFilter/simplified_true_foo|bar-4                          78.5ns ± 1%   76.9ns ± 4%     ~     (p=0.151 n=5+5)
_LineFilter/simplified_false_foo|bar-4                         97.8ns ± 1%  100.1ns ± 6%     ~     (p=0.548 n=5+5)
_LineFilter/simplified_true_foo|bar|buzz-4                     92.5ns ± 2%   92.0ns ± 2%     ~     (p=0.548 n=5+5)
_LineFilter/simplified_false_foo|bar|buzz-4                     115ns ± 1%    114ns ± 2%     ~     (p=0.310 n=5+5)
_LineFilter/simplified_true_foo|(bar|buzz)-4                   93.7ns ± 4%   92.2ns ± 4%     ~     (p=0.222 n=5+5)
_LineFilter/simplified_false_foo|(bar|buzz)-4                   114ns ± 2%    112ns ± 1%     ~     (p=0.056 n=5+5)
_LineFilter/simplified_true_foo|bar.*|buzz-4                   93.5ns ± 4%   91.5ns ± 2%     ~     (p=0.222 n=5+5)
_LineFilter/simplified_false_foo|bar.*|buzz-4                   115ns ± 4%    114ns ± 2%     ~     (p=0.714 n=5+5)
_LineFilter/simplified_true_.*foo.*|bar|uzz-4                  92.3ns ± 3%   91.3ns ± 2%     ~     (p=0.087 n=5+5)
_LineFilter/simplified_false_.*foo.*|bar|uzz-4                  116ns ± 3%    115ns ± 3%     ~     (p=0.460 n=5+5)
_LineFilter/simplified_true_((f.*)|foobar.*)|.*buzz-4           145ns ± 2%    146ns ± 1%     ~     (p=0.222 n=5+5)
_LineFilter/simplified_false_((f.*)|foobar.*)|.*buzz-4          180ns ± 2%    179ns ± 1%     ~     (p=0.421 n=5+5)
_LineFilter/simplified_true_(?P<foo>.*foo.*|bar)-4             79.0ns ± 2%   77.3ns ± 3%     ~     (p=0.151 n=5+5)
_LineFilter/simplified_false_(?P<foo>.*foo.*|bar)-4             104ns ± 6%     99ns ± 4%     ~     (p=0.056 n=5+5)
_LineFilter/simplified_true_(\s|")+(?i)bar-4                   7.10µs ± 5%   6.87µs ± 2%   -3.24%  (p=0.032 n=5+5)
_LineFilter/simplified_false_(\s|")+(?i)bar-4                  7.09µs ± 2%   6.90µs ± 2%   -2.72%  (p=0.016 n=5+5)
_LineFilter/simplified_true_(node:24)_buzz*-4                   110ns ± 3%     45ns ± 2%  -58.49%  (p=0.008 n=5+5)
_LineFilter/simplified_false_(node:24)_buzz*-4                  120ns ± 2%     56ns ± 4%  -53.25%  (p=0.008 n=5+5)
_LineFilter/simplified_true_(HTTP/.*\"|HEAD|GET)_(2..|5..)-4   8.11µs ± 2%   7.84µs ± 3%   -3.38%  (p=0.016 n=5+5)
_LineFilter/simplified_false_(HTTP/.*\"|HEAD|GET)_(2..|5..)-4  8.12µs ± 2%   7.85µs ± 1%   -3.34%  (p=0.008 n=5+5)
_LineFilter/simplified_true_"@l":"(Warning|Error|Fatal)"-4      110ns ± 2%     46ns ± 4%  -58.33%  (p=0.008 n=5+5)
_LineFilter/simplified_false_"@l":"(Warning|Error|Fatal)"-4     122ns ± 2%     57ns ± 3%  -52.85%  (p=0.008 n=5+5)

Checklist

  • NA Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

@bboreham bboreham requested a review from a team as a code owner February 3, 2022 14:21
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Love it does it work the same for capture and such as the line_format regexp can match and return values as opposed to line filters which can only filters.

Add a few more cases based on real-world usage.

Also simplify the loop to just run the number of times requested.
Go benchmarks run 'N' times to reach a duration, default 1 second.
Previously the benchmark was running N*1000000 times, which
took 7 seconds minimum for some cases.
See https://github.com/grafana/regexp/tree/speedup#readme

Includes the following changes proposed upstream:
* [regexp: allow patterns with no alternates to be one-pass](https://go-review.googlesource.com/c/go/+/353711)
* [regexp: speed up onepass prefix check](https://go-review.googlesource.com/c/go/+/354909)
* [regexp: handle prefix string with fold-case](https://go-review.googlesource.com/c/go/+/358756)
* [regexp: avoid copying each instruction executed](https://go-review.googlesource.com/c/go/+/355789)
* [regexp: allow prefix string anchored at beginning](https://go-review.googlesource.com/c/go/+/377294)
@bboreham
Copy link
Contributor Author

bboreham commented Feb 4, 2022

does it work the same for capture and such

Yes, the same regexp engine is used for Match, Find and Replace, so they should see some benefit.
I guess we should have a benchmark for those.

@cyriltovena
Copy link
Contributor

Let's sync to test this in dev and check the wins.

@cyriltovena cyriltovena merged commit a50cac7 into main Feb 7, 2022
@cyriltovena cyriltovena deleted the use-faster-regexp branch February 7, 2022 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants