-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache regex compilation #2328
cache regex compilation #2328
Conversation
Can you reformat the code with clang-format or apply the changes from here: |
d205579
to
7e4a101
Compare
Sorry about that. I was using clang-format which is version 14 on my host, but it looks like github CI uses 11. |
Looks like there may be an issue with noomp. I fixed it and will push and another update shortly, after the tests finish. |
7e4a101
to
906e31c
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2328 +/- ##
==========================================
+ Coverage 77.43% 77.48% +0.04%
==========================================
Files 468 469 +1
Lines 29178 29183 +5
==========================================
+ Hits 22594 22611 +17
+ Misses 6584 6572 -12
|
An alternative design that would avoid introducing a dedicated data structure would be:
EDIT: Of course that's possible if you only have regexes known in advance. |
…ertOnlyHashMap. The function weakFind returns the value associated with a key and provides a bit more use than just weakContains, which is now expressed in terms of weakFind. The weakFind function allows us to implement a fast path for implementing a cache, because we can lookup values without requiring any kind of memory allocation, which the get() function requires.
The ConcurrentCache uses the ConcurrentInsertOnlyHashMap to implement a generic cache. The cache will be used to implement a cache for regular expressions.
Using the ConcurrentCache speeds up regex matches tremendously, because we avoid the very expensive operation of compiling the same regex over and over again. This commit introduces the cache in the interpreter only.
Update the synthesizer to use a cache for regexes. One drawback is that we now need to #include <regex>, but the structure of the synthesizer makes impossible to generate the cache only when needed.
906e31c
to
9b9db6c
Compare
I added code along the lines you had suggested, but I left the cache alone. So, in aggregate, this update provides fast access to constant regexes (probably the majority) and a cache for fast evaluation of dynamic regexes. I created a RegexConstant, which replaces a StringConstant when processing a MATCH/NOT_MATCH constraint. When interpreting MATCH/NOT_MATCH I check if the pattern is a regex, otherwise the existing code is executed. I could not think of a better way to do this right now. |
For constraints like match(".*", X), where the pattern is a string constant, we can avoid using the regex cache. Avoiding the cache is going to perform better in all cases. I introduced the new node RegexConstant which only appears immediately below a MATCH or NOT_MATCH node. The behavior with regards to invalid regexes has not changed and bad regexes will not lead to program errors.
25c7abd
to
08a5fad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Great job!
Thanks. This helps us a lot, too. |
We have run into some issues with regex constraints. Our program needs to be make use of the regex constraint, but we quickly noticed that it led to quite a slowdown. The runtime of our program was around 100 seconds and I tracked it down to the fact that regexes are compiled each time a match is performed. We decided to cache the construction of the regexes, of which there are very few, and this reduced the runtime to less than one second.
This patch introduces a new data structure called ConcurrentCache, which a thin wrapper around the existing concurrent hashmap. Presently, the cache is always generated by the compiler, but I think that is something we can change later, possibly making it optional.