-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topdown: Adding cap to caches for regex and glob built-in functions #6846
topdown: Adding cap to caches for regex and glob built-in functions #6846
Conversation
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -101,6 +103,16 @@ opa_value *opa_glob_match(opa_value *pattern, opa_value *delimiters, opa_value * | |||
return NULL; | |||
} | |||
|
|||
if (c->size() >= MAX_CACHE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to add tests for this and the regex built-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved, isn't it?
Signed-off-by: Johan Fylling <[email protected]>
topdown/regex_bench_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goos: darwin
goarch: arm64
pkg: github.com/open-policy-agent/opa/topdown
│ regex_main.txt │ regex_pr.txt │
│ sec/op │ sec/op vs base │
BuiltinRegexMatch/reuse-pattern=true,_pattern-count=10-10 2.216µ ± 1% 2.213µ ± 2% ~ (p=0.838 n=10)
BuiltinRegexMatch/reuse-pattern=true,_pattern-count=100-10 12.86µ ± 1% 12.97µ ± 3% ~ (p=0.218 n=10)
BuiltinRegexMatch/reuse-pattern=true,_pattern-count=1000-10 117.7µ ± 0% 117.8µ ± 1% ~ (p=0.579 n=10)
BuiltinRegexMatch/reuse-pattern=false,_pattern-count=10-10 12.26µ ± 1% 12.32µ ± 5% ~ (p=0.123 n=10)
BuiltinRegexMatch/reuse-pattern=false,_pattern-count=100-10 148.3µ ± 0% 148.3µ ± 0% ~ (p=0.353 n=10)
BuiltinRegexMatch/reuse-pattern=false,_pattern-count=1000-10 1.712m ± 0% 1.642m ± 0% -4.08% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=100,_pattern-count=10-10 215.1µ ± 2% 217.1µ ± 3% ~ (p=0.684 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=100,_pattern-count=100-10 1.787m ± 1% 1.776m ± 1% -0.60% (p=0.043 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=100,_pattern-count=1000-10 17.35m ± 0% 17.37m ± 0% ~ (p=0.579 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=200,_pattern-count=10-10 441.7µ ± 1% 439.9µ ± 1% ~ (p=0.353 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=200,_pattern-count=100-10 3.596m ± 1% 3.575m ± 1% -0.59% (p=0.009 n=10)
BuiltinRegexMatchAsync/reuse-pattern=true,_clients=200,_pattern-count=1000-10 35.07m ± 0% 34.81m ± 1% ~ (p=0.063 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=100,_pattern-count=10-10 2.450m ± 1% 2.199m ± 0% -10.24% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=100,_pattern-count=100-10 26.02m ± 1% 22.58m ± 1% -13.24% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=100,_pattern-count=1000-10 266.0m ± 1% 256.9m ± 1% -3.43% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=200,_pattern-count=10-10 5.665m ± 1% 4.679m ± 1% -17.41% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=200,_pattern-count=100-10 54.16m ± 2% 49.35m ± 1% -8.87% (p=0.000 n=10)
BuiltinRegexMatchAsync/reuse-pattern=false,_clients=200,_pattern-count=1000-10 538.1m ± 1% 544.5m ± 1% ~ (p=0.105 n=10)
geomean 1.676m 1.621m -3.28%
@@ -64,6 +66,13 @@ func globCompileAndMatch(id, pattern, match string, delimiters []rune) (bool, er | |||
if p, err = glob.Compile(pattern, delimiters...); err != nil { | |||
return false, err | |||
} | |||
if len(globCache) >= globCacheMaxSize { | |||
// Delete a (semi-)random key to make room for the new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why semi? I think it's pretty random 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not explicitly random, as the randomness is an implementation detail of the map.
I also think that it'd be deterministic over multiple runs with the same map entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're checking len(globCache) >= globCacheMaxSize
but we only remove one. This is the only code path where cached patterns are inserted, right? So it cannot happen that through some other code path, the cache would still continue to grow without bounds because we're only removing one element here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, discounting tests, this is the only place we touch the cache.
Signed-off-by: Johan Fylling <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm glad this is getting fixed 🥳
@@ -64,6 +66,13 @@ func globCompileAndMatch(id, pattern, match string, delimiters []rune) (bool, er | |||
if p, err = glob.Compile(pattern, delimiters...); err != nil { | |||
return false, err | |||
} | |||
if len(globCache) >= globCacheMaxSize { | |||
// Delete a (semi-)random key to make room for the new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're checking len(globCache) >= globCacheMaxSize
but we only remove one. This is the only code path where cached patterns are inserted, right? So it cannot happen that through some other code path, the cache would still continue to grow without bounds because we're only removing one element here.
@@ -101,6 +103,16 @@ opa_value *opa_glob_match(opa_value *pattern, opa_value *delimiters, opa_value * | |||
return NULL; | |||
} | |||
|
|||
if (c->size() >= MAX_CACHE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved, isn't it?
Fixing possible memory leak where caches grow uncontrollably when large amounts of regexes or globs are generated or originate from the
input
document.Fixes: #6828