Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib: add Hyperscan regex engine #21973

Merged
merged 42 commits into from
Aug 3, 2022
Merged

Conversation

zhxie
Copy link
Contributor

@zhxie zhxie commented Jul 1, 2022

Signed-off-by: Xie Zhihao [email protected]

Commit Message: contrib: add Hyperscan regex engine
Additional Description: Hyperscan has been introduced as an input matcher earlier this year. Since the regex engine interface has been completed, the patch extent the usage of Hyperscan into a contrib regex engine.
Risk Level: Low
Testing: Unit
Docs Changes: API
Release Notes: N/A
Platform Specific Features: Requires processor with SSSE3 support (nearly any modern x86 processor)

@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Jul 1, 2022
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @moderation

🐱

Caused by: #21973 was opened by zhxie.

see: more, trace.

@wrowe
Copy link
Contributor

wrowe commented Jul 6, 2022

/assign-from envoyproxy/envoy-maintainers

@repokitteh-read-only
Copy link

envoyproxy/envoy-maintainers assignee is @jmarantz

🐱

Caused by: a #21973 (comment) was created by @wrowe.

see: more, trace.

@wrowe
Copy link
Contributor

wrowe commented Jul 6, 2022

Ping @moderation for deps review?

@moderation
Copy link
Contributor

Metadata only change for the dependency

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Jul 6, 2022
Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit but otherwise lgtm from docs pov

Signed-off-by: Xie Zhihao <[email protected]>
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing some comments. Lots to dig into here.

zhxie added 2 commits July 15, 2022 13:50
Signed-off-by: Xie Zhihao <[email protected]>
zhxie added 2 commits July 29, 2022 10:00
Signed-off-by: Xie Zhihao <[email protected]>
Signed-off-by: Xie Zhihao <[email protected]>
@zhxie
Copy link
Contributor Author

zhxie commented Jul 29, 2022

Replied offline about end users consideration.

Scalable path evaluation is not conflict with regex engine update. In fact, a regex engine can be used everywhere in Envoy, especially in security region (including WAF-like extensions like RBAC).

@zhxie
Copy link
Contributor Author

zhxie commented Jul 29, 2022

/retest flaky

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #21973 (comment) was created by @zhxie.

see: more, trace.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good -- just a few more small questions.

// hybrid automata techniques to allow simultaneous matching of large numbers of regular
// expressions across streams of data.
//
// The matcher follows PCRE pattern syntax, and the regex string must adhere to the documented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we drop this paragraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraph has been moved to api/contrib/envoy/extensions/regex_engines/hyperscan/v3alpha/hyperscan.proto since it is the introduction of the engine itself.

if (regex.allow_empty()) {
flag |= HS_FLAG_ALLOWEMPTY;
}
if (regex.utf8()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure I understand -- are you saying that you'll flip the defaults in the .proto file file to try to match re2 behavior?

if (som_database) {
err = hs_alloc_scratch(som_database, &scratch_);
if (err != HS_SUCCESS) {
throw EnvoyException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think what I really want to express here is that the line of code can't be hit.

There's certainly a concern with lines of code that can't be tested, and whether anyone would catch this assert successfully and what they would do about that if there was no memory left.

We used to have a macro to indicate specifically that a line of code can't be hit, and previously the toolchain running the coverage analysis would not count those lines, but that stopped working at some point when the toolchain evolved.

WDYT of just using RELEASE_ASSERT here rather than throwing an exception?

@jmarantz
Copy link
Contributor

I think @zhxie did a great job with the code in this PR -- thanks for all the iteration.

I still have some higher level concerns

I am worried about propagating an alternate regex syntax (even as contrib/...) and bifurcating the community. I also wish Hyperscan didn't bring in Boost.

I thought also about whether some of the Hyperscan techniques should just be ported to re2 but that seems unlikely. There was some enlightening discussion on google/re2#383 . Notably -- there are microbenchmarks showing Hyperscan's raw speed but no obvious benchmarks exposed in the context of Envoy operations.

@mattklein123

In light of the above I'd like to get senior-maintainer opinion on this.

/assign-from @envoyproxy/senior-maintainers

@repokitteh-read-only
Copy link

@envoyproxy/senior-maintainers assignee is @alyssawilk

🐱

Caused by: a #21973 (comment) was created by @jmarantz.

see: more, trace.

@alyssawilk
Copy link
Contributor

I had largely thought that with contrib we didn't have to monitor things like dependency bloat from boost or extra regex engines. I'm going to throw senior maintianer pass over to Matt as he's back tomorrow and he was driving the recent changes to contrib.

@alyssawilk alyssawilk assigned mattklein123 and unassigned alyssawilk Aug 1, 2022
@mattklein123
Copy link
Member

In light of the above I'd like to get senior-maintainer opinion on this.

I'm OK with this one coming into contrib as I've heard of multiple requests for using HyperScan to speed up matching (for example in WAF scenarios). IMO it's OK to let this into contrib and see how it gets used? Happy to defer to other opinions if they are strong however.

@alyssawilk
Copy link
Contributor

SGTM. @moderation can we get one more deps LGTM so we can merge this in?

@jmarantz
Copy link
Contributor

jmarantz commented Aug 2, 2022

OK with me then. @zhxie were you going to change defaults in this PR to make it as easy as possible for a configuration that works with re2 to also work with hyperscan?

@zhxie
Copy link
Contributor Author

zhxie commented Aug 3, 2022

@jmarantz of course. As I have mentioned, I have removed all options in Hyperscan regex engine proto to align the behavior with RE2.

@jmarantz
Copy link
Contributor

jmarantz commented Aug 3, 2022

Sorry still not totally getting it.

I was thinking when reviewing code earlier that some of the out-of-the-box options for Hyperscan around embedded newlines and unicode were different than what I think RE2 behavior is, and that we'd need explicit defaults to make them as compatible as practical.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per slack discussion the defaults are set to maximize compatibility with RE2. Thanks for pushing this all the way through!

@mattklein123 mattklein123 removed their assignment Aug 3, 2022
@moderation
Copy link
Contributor

/lgtm deps

This is a metadata only dependency change. The initial Hyperscan contrib extension that brings in unwanted dependencies like Boost was added prior to the current contrib extension policy - https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md#contrib-extensions. The Hyperscan input matcher and regex engine would likely no longer be approved without an end user sponsor. These extensions appear to be R&D activities for a vendor without end users.

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Aug 3, 2022
@jmarantz jmarantz merged commit 725d0b4 into envoyproxy:main Aug 3, 2022
@zhxie zhxie deleted the hscan-engine branch September 7, 2022 08:31
@sambercovici
Copy link
Contributor

I see that hyperscan is compiled without "FAT_RUNTIME": "on"
Was there any reason fore this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants