-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KQL] Use cache and other performance improvements #93319
Conversation
Pinging @elastic/kibana-app-services (Team:AppServices) |
@@ -28,15 +28,15 @@ start | |||
OrQuery | |||
= &{ return errorOnLuceneSyntax; } LuceneQuery | |||
/ left:AndQuery Or right:OrQuery { | |||
const cursor = [left, right].find(node => node.type === 'cursor'); | |||
const cursor = parseCursor && [left, right].find(node => node.type === 'cursor'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseCursor
is an option passed to the parser which essentially specifies whether we're parsing for autocomplete suggestions or not. If false
, then we can short-circuit any autocomplete logic (stuff where node.type === 'cursor'
).
@@ -209,7 +209,7 @@ Literal "literal" | |||
= QuotedString / UnquotedLiteral | |||
|
|||
QuotedString | |||
= '"' prefix:QuotedCharacter* cursor:Cursor suffix:QuotedCharacter* '"' { | |||
= &{ return parseCursor; } '"' prefix:QuotedCharacter* cursor:Cursor suffix:QuotedCharacter* '"' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://pegjs.org/documentation#grammar-syntax-and-semantics. This is a "predicate" which essentially does the same check as above before trying this grammar rule.
These are super impressive improvements for the amount of effort. 🙇♂️💝 |
'whitespace but "<" found.\ndashboard.attributes.title:foo' + | ||
'<invalid\n------------------------------^: Bad Request', | ||
'KQLSyntaxError: Expected AND, OR, end of input but "<" found.\ndashboard.' + | ||
'attributes.title:foo<invalid\n------------------------------^: Bad Request', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're now shortcutting the whitespace
rule when parseCursor
is false
, it's no longer included in the error messages as one of the acceptable alternatives (which should have been the case to begin with).
Note: We should make sure that changing this option doesn't considerably increase heap usage for valid use cases (see pegjs/pegjs#590). |
The cache is re-initialized for each expression. The heap at the end of the benchmarks is equivalent before and after, but the size of the cache itself after running the complicated expression is ~3.1 MB. This seems like an acceptable tradeoff for CPU time. |
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Page load bundle
History
To update your PR or re-run it, just comment with: cc @lukasolson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM
* [KQL] Use cache and other performance improvements * Fix test * Fix jest tests Co-authored-by: Kibana Machine <[email protected]>
* [KQL] Use cache and other performance improvements * Fix test * Fix jest tests Co-authored-by: Kibana Machine <[email protected]>
* [KQL] Use cache and other performance improvements * Fix test * Fix jest tests Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
* [KQL] Use cache and other performance improvements * Fix test * Fix jest tests Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
@lukasolson it's pretty hard to tell how this cache works, is it storing results for every filter value ever submitted in memory? Is the cache ever rotated or cleared? I'm not seeing anything like that in the generated code and this issue suggest that the cache will just grow and grow the more queries are parsed. |
## Summary Resolves #143335. Some history: A similar issue was reported a few years back (#76811). The solution (#93319) was to use the `--cache` PEG.js [parameter](https://pegjs.org/documentation#generating-a-parser) when generating the parser. Back when this was added, we were still manually building the parser on demand when it was changed. Eventually we added support for dynamically building the parser during the build process (#145615). I'm not sure where along the process the `cache` parameter got lost but it didn't appear to be used when we switched. This PR re-adds this parameter which increases performance considerably (metrics shown in ops/sec): ``` Before using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 7110.68990544415 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 40.51361746242248 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 17.071767133068473 After using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 8275.49109867502 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 447.0459218892934 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 115852.43643466769 ``` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Summary
Resolves #76811.
This PR improves KQL parsing performance in the following ways:
--cache
PEG.js parameter when generating the parserBenchmarks from the above linked issue prior to this PR:
And after this PR: