Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider switching to FilteredRE2 #38

Open
junyer opened this issue Mar 15, 2023 · 1 comment
Open

consider switching to FilteredRE2 #38

junyer opened this issue Mar 15, 2023 · 1 comment

Comments

@junyer
Copy link

junyer commented Mar 15, 2023

RE2 offers a couple of lesser known features for matching multiple regular expressions. Given that internal/README.md describes a "snippet index", which sounds remarkably like FilteredRE2, you might want to consider switching to FilteredRE2 and deleting the "snippet index" code.

@masklinn
Copy link

I would also recommend taking a gander at this suggestion: while I don't have a direct comparison the port of FilteredRE2 in ua-parser/uap-rust is about on par with FilteredRE2 (with just an re2::RE2::Set prefilter), and uap-cpp is about 2.5x slower than uap-rust from what I've seen so far.

While I believe re2::RE2 can be quite slow at extraction1 I would assume most of the difference is in actually finding out the correct regex since rust's own regex is also way slower when capturing than when just matching (by a factor of 2-3x iirc).

Footnotes

  1. I actually experimented with extracting using re2 in ua-parser/uap-python and it turned out to be slower than using the built-in re package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants