Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KQL] Use cache and other performance improvements #93319

Merged
merged 6 commits into from
Mar 8, 2021

Conversation

lukasolson
Copy link
Member

Summary

Resolves #76811.

This PR improves KQL parsing performance in the following ways:

  • Uses the --cache PEG.js parameter when generating the parser
  • Optimizes performance when autocomplete is unnecessary

Benchmarks from the above linked issue prior to this PR:

parse simple KQL x 2,431 ops/sec ±2.19% (86 runs sampled)
parse complex KQL x 9.64 ops/sec ±3.28% (29 runs sampled)

And after this PR:

parse simple KQL x 14,703 ops/sec ±15.90% (79 runs sampled)
parse complex KQL x 163 ops/sec ±6.21% (54 runs sampled)

@lukasolson lukasolson self-assigned this Mar 2, 2021
@lukasolson lukasolson requested a review from a team as a code owner March 2, 2021 21:36
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@@ -28,15 +28,15 @@ start
OrQuery
= &{ return errorOnLuceneSyntax; } LuceneQuery
/ left:AndQuery Or right:OrQuery {
const cursor = [left, right].find(node => node.type === 'cursor');
const cursor = parseCursor && [left, right].find(node => node.type === 'cursor');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseCursor is an option passed to the parser which essentially specifies whether we're parsing for autocomplete suggestions or not. If false, then we can short-circuit any autocomplete logic (stuff where node.type === 'cursor').

@@ -209,7 +209,7 @@ Literal "literal"
= QuotedString / UnquotedLiteral

QuotedString
= '"' prefix:QuotedCharacter* cursor:Cursor suffix:QuotedCharacter* '"' {
= &{ return parseCursor; } '"' prefix:QuotedCharacter* cursor:Cursor suffix:QuotedCharacter* '"' {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://pegjs.org/documentation#grammar-syntax-and-semantics. This is a "predicate" which essentially does the same check as above before trying this grammar rule.

@kobelb
Copy link
Contributor

kobelb commented Mar 2, 2021

These are super impressive improvements for the amount of effort. 🙇‍♂️💝

'whitespace but "<" found.\ndashboard.attributes.title:foo' +
'<invalid\n------------------------------^: Bad Request',
'KQLSyntaxError: Expected AND, OR, end of input but "<" found.\ndashboard.' +
'attributes.title:foo<invalid\n------------------------------^: Bad Request',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're now shortcutting the whitespace rule when parseCursor is false, it's no longer included in the error messages as one of the acceptable alternatives (which should have been the case to begin with).

@lukasolson
Copy link
Member Author

Note: We should make sure that changing this option doesn't considerably increase heap usage for valid use cases (see pegjs/pegjs#590).

@lukasolson
Copy link
Member Author

The cache is re-initialized for each expression. The heap at the end of the benchmarks is equivalent before and after, but the size of the cache itself after running the complicated expression is ~3.1 MB. This seems like an acceptable tradeoff for CPU time.

@lukasolson
Copy link
Member Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 815.5KB 825.1KB +9.6KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lukasolson

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@lukasolson lukasolson merged commit 2b3bac9 into elastic:master Mar 8, 2021
lukasolson added a commit to lukasolson/kibana that referenced this pull request Mar 8, 2021
* [KQL] Use cache and other performance improvements

* Fix test

* Fix jest tests

Co-authored-by: Kibana Machine <[email protected]>
lukasolson added a commit to lukasolson/kibana that referenced this pull request Mar 8, 2021
* [KQL] Use cache and other performance improvements

* Fix test

* Fix jest tests

Co-authored-by: Kibana Machine <[email protected]>
lukasolson added a commit that referenced this pull request Mar 8, 2021
* [KQL] Use cache and other performance improvements

* Fix test

* Fix jest tests

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
lukasolson added a commit that referenced this pull request Mar 8, 2021
* [KQL] Use cache and other performance improvements

* Fix test

* Fix jest tests

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
@spalger
Copy link
Contributor

spalger commented Apr 6, 2021

@lukasolson it's pretty hard to tell how this cache works, is it storing results for every filter value ever submitted in memory? Is the cache ever rotated or cleared? I'm not seeing anything like that in the generated code and this issue suggest that the cache will just grow and grow the more queries are parsed.

lukasolson added a commit that referenced this pull request Apr 27, 2024
## Summary

Resolves #143335.

Some history: A similar issue was reported a few years back
(#76811). The solution
(#93319) was to use the `--cache`
PEG.js [parameter](https://pegjs.org/documentation#generating-a-parser)
when generating the parser. Back when this was added, we were still
manually building the parser on demand when it was changed. Eventually
we added support for dynamically building the parser during the build
process (#145615). I'm not sure
where along the process the `cache` parameter got lost but it didn't
appear to be used when we switched.

This PR re-adds this parameter which increases performance considerably
(metrics shown in ops/sec):

```
Before using cache:

  ● kuery AST API › fromKueryExpression › performance › with simple expression
    Received:   7110.68990544415

  ● kuery AST API › fromKueryExpression › performance › with complex expression
    Received:   40.51361746242248

  ● kuery AST API › fromKueryExpression › performance › with many subqueries
    Received:   17.071767133068473

After using cache:

  ● kuery AST API › fromKueryExpression › performance › with simple expression
    Received:   8275.49109867502

  ● kuery AST API › fromKueryExpression › performance › with complex expression
    Received:   447.0459218892934

  ● kuery AST API › fromKueryExpression › performance › with many subqueries
    Received:   115852.43643466769
```

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KQL expression parsing is slow
6 participants