-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib: add Hyperscan regex engine #21973
Conversation
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to |
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
/assign-from envoyproxy/envoy-maintainers |
envoyproxy/envoy-maintainers assignee is @jmarantz |
Ping @moderation for deps review? |
Metadata only change for the dependency /lgtm deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small nit but otherwise lgtm from docs pov
Signed-off-by: Xie Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing some comments. Lots to dig into here.
api/contrib/envoy/extensions/matching/input_matchers/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
api/contrib/envoy/extensions/matching/input_matchers/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto
Outdated
Show resolved
Hide resolved
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
Replied offline about end users consideration. Scalable path evaluation is not conflict with regex engine update. In fact, a regex engine can be used everywhere in Envoy, especially in security region (including WAF-like extensions like RBAC). |
/retest flaky |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good -- just a few more small questions.
// hybrid automata techniques to allow simultaneous matching of large numbers of regular | ||
// expressions across streams of data. | ||
// | ||
// The matcher follows PCRE pattern syntax, and the regex string must adhere to the documented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we drop this paragraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paragraph has been moved to api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto
since it is the introduction of the engine itself.
if (regex.allow_empty()) { | ||
flag |= HS_FLAG_ALLOWEMPTY; | ||
} | ||
if (regex.utf8()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure I understand -- are you saying that you'll flip the defaults in the .proto file file to try to match re2 behavior?
if (som_database) { | ||
err = hs_alloc_scratch(som_database, &scratch_); | ||
if (err != HS_SUCCESS) { | ||
throw EnvoyException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think what I really want to express here is that the line of code can't be hit.
There's certainly a concern with lines of code that can't be tested, and whether anyone would catch this assert successfully and what they would do about that if there was no memory left.
We used to have a macro to indicate specifically that a line of code can't be hit, and previously the toolchain running the coverage analysis would not count those lines, but that stopped working at some point when the toolchain evolved.
WDYT of just using RELEASE_ASSERT here rather than throwing an exception?
I think @zhxie did a great job with the code in this PR -- thanks for all the iteration. I still have some higher level concerns
I am worried about propagating an alternate regex syntax (even as contrib/...) and bifurcating the community. I also wish Hyperscan didn't bring in Boost. I thought also about whether some of the Hyperscan techniques should just be ported to re2 but that seems unlikely. There was some enlightening discussion on google/re2#383 . Notably -- there are microbenchmarks showing Hyperscan's raw speed but no obvious benchmarks exposed in the context of Envoy operations. In light of the above I'd like to get senior-maintainer opinion on this. /assign-from @envoyproxy/senior-maintainers |
@envoyproxy/senior-maintainers assignee is @alyssawilk |
Signed-off-by: Xie Zhihao <[email protected]>
I had largely thought that with contrib we didn't have to monitor things like dependency bloat from boost or extra regex engines. I'm going to throw senior maintianer pass over to Matt as he's back tomorrow and he was driving the recent changes to contrib. |
I'm OK with this one coming into contrib as I've heard of multiple requests for using HyperScan to speed up matching (for example in WAF scenarios). IMO it's OK to let this into contrib and see how it gets used? Happy to defer to other opinions if they are strong however. |
SGTM. @moderation can we get one more deps LGTM so we can merge this in? |
OK with me then. @zhxie were you going to change defaults in this PR to make it as easy as possible for a configuration that works with re2 to also work with hyperscan? |
@jmarantz of course. As I have mentioned, I have removed all options in Hyperscan regex engine proto to align the behavior with RE2. envoy/api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto Lines 24 to 25 in e09647f
|
Sorry still not totally getting it. I was thinking when reviewing code earlier that some of the out-of-the-box options for Hyperscan around embedded newlines and unicode were different than what I think RE2 behavior is, and that we'd need explicit defaults to make them as compatible as practical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per slack discussion the defaults are set to maximize compatibility with RE2. Thanks for pushing this all the way through!
/lgtm deps This is a metadata only dependency change. The initial Hyperscan contrib extension that brings in unwanted dependencies like Boost was added prior to the current contrib extension policy - https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md#contrib-extensions. The Hyperscan input matcher and regex engine would likely no longer be approved without an end user sponsor. These extensions appear to be R&D activities for a vendor without end users. |
I see that hyperscan is compiled without "FAT_RUNTIME": "on" |
Signed-off-by: Xie Zhihao [email protected]
Commit Message: contrib: add Hyperscan regex engine
Additional Description: Hyperscan has been introduced as an input matcher earlier this year. Since the regex engine interface has been completed, the patch extent the usage of Hyperscan into a contrib regex engine.
Risk Level: Low
Testing: Unit
Docs Changes: API
Release Notes: N/A
Platform Specific Features: Requires processor with SSSE3 support (nearly any modern x86 processor)