[Lens] optimize duplicate formula functions #140859

drewdaemon · 2022-09-15T19:41:53Z

Summary

Optimizes semantically-duplicate functions of the following types

average
count
last_value
max
median
min
standard_deviation
sum
unique_count

Left to do

percentile rank
filtered percentiles

This PR also improves how we update order-by references from terms aggregations for percentile operations. Instead of anticipating the optimization in the toEsAggsFn method on the terms operation class, we update the order-bys as part of the AST transformations themselves in the optimizeEsAggs method on the percentile operation class.

Testing

Create a visualization and add a dimension with this formula

median(bytes) + median(bytes) +

sum(bytes) + sum(bytes) +

max(hour_of_day) + max(hour_of_day) +

average(bytes) + average(bytes) +

standard_deviation(bytes) + standard_deviation(bytes) +

min(machine.ram, shift='1h') + min(machine.ram, shift='1h') +

min(machine.ram, shift='2h') + min(machine.ram, shift='2h')

Check the Elasticsearch request in the inspector. There should only be 7 aggregations, not 14.

Then try this formula

sum(bytes) + sum(bytes) +

sum(bytes, kql='geo.dest: "GA" ') + sum(bytes, kql='geo.dest: "GA" ') +

sum(bytes, kql='geo.dest: "AL" ') + sum(bytes, kql='geo.dest: "AL" ') 

+ sum(bytes, lucene='geo.dest: "AL" ') + 
sum(bytes, lucene='geo.dest: "AL" ') + 

sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m') + sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m')

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

add automated test

drewdaemon · 2022-09-19T20:03:01Z

@elasticmachine merge upstream

…5/optimize-duplicate-last-value-functions

drewdaemon · 2022-09-21T14:53:26Z

@flash1293 Thinking more about how much flexibility to give the operation classes for optimizing...

All the simple “dedupe” optimizations are turning out to follow the same set of steps

group duplicates
remove all but one agg from each group
update the idMap to map the single agg to all the original columns
update terms agg order-by references

The only step here that is really specific to an operation type is deciding which aggs are duplicate.

So, we could extend the operation class with a method called something like getGroupByKey(agg) and have the datasource's to_expression take care of the rest. We could leave the optimizeEsAggs method in place for more complicated optimization scenarios such as we do with the percentiles. That way, it’s a lot cheaper to take advantage of the most common optimization, but there’s still flexibility.

Any thoughts?

flash1293 · 2022-09-21T14:55:02Z

@andrewctate This makes sense to me

drewdaemon · 2022-09-22T00:37:10Z

@elasticmachine merge upstream

…ub.com:andrewctate/kibana into 135265/optimize-duplicate-last-value-functions

elasticmachine · 2022-09-22T02:15:54Z

Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors)

flash1293

This works well but the getGroupByKey implementations are very similar across operations. Could we unify this further (either as a helper function importing in every operation or by having the operation just state the arguments that need to be checked)

…-duplicate-last-value-functions

drewdaemon · 2022-09-23T18:47:01Z

@elasticmachine merge upstream

flash1293

Looks almost good to me, found one edge case that could be handled better. I tested around and this is a really nice optimization.

I think it will become more relevant with the planned formula features like conditionals, but even right now it's already super nice:

Both of these just need to fetch data once now 🎉

x-pack/plugins/lens/public/indexpattern_datasource/operations/definitions/last_value.tsx

drewdaemon · 2022-09-26T13:01:58Z

@kibanamachine merge upstream

kibana-ci · 2022-09-26T15:32:29Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: a8e560d

Failed CI Steps

Rules, Alerts and Exceptions ResponseOps Cypress Tests on Security Solution

Test Failures

[job] [logs] Rules, Alerts and Exceptions ResponseOps Cypress Tests on Security Solution / Alerts detection rules table auto-refresh should disable auto refresh when any rule selected and enable it after rules unselected

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`lens`	906	908	+2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`data`	2506	2508	+2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`lens`	1.2MB	1.2MB	+2.2KB

Unknown metric groups

API count

id	before	after	diff
`data`	3209	3211	+2

ESLint disabled line counts

id	before	after	diff
`lens`	25	26	+1

Total ESLint disabled count

id	before	after	diff
`lens`	28	29	+1

History

💔 Build #75534 failed 845822f
💛 Build #75407 was flaky 5337c53
💔 Build #75381 failed 5304126
💔 Build #75348 failed 68cc171
💚 Build #74678 succeeded 38e0ce3
💔 Build #74663 failed c010425

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

lukasolson

AppServices changes LGTM

drewdaemon added 6 commits September 14, 2022 14:29

collapse duplicate metric aggs

7d802fe

group by emptyAsNull setting

8988e38

add automated test

make sure it doesn't touch other aggs

e83ac24

test that aggConfigParams are preserved

e92f797

revert whitespace change

0a3e40b

optimize duplicate last-value functions

10b036d

drewdaemon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Sep 15, 2022

drewdaemon and others added 6 commits September 15, 2022 15:04

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

68375fa

get timeshift from filter agg

6be5895

handle filtered aggs

07e8bc6

simplify agg deduplication logic

62b306d

test collapsing duplicate filtered aggs

3a4e613

Merge branch 'main' into 135265/optimize-redundant-formula-functions

26485c3

kibanamachine and others added 3 commits September 19, 2022 14:03

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

af43cc9

Merge branch '135265/optimize-redundant-formula-functions' into 13526…

4417f9c

…5/optimize-duplicate-last-value-functions

dedupe last value aggs

6c8414c

drewdaemon force-pushed the 135265/optimize-duplicate-last-value-functions branch from 1819a6a to 6c8414c Compare September 19, 2022 23:34

drewdaemon added 2 commits September 19, 2022 18:45

dedupe groupByKey function

08705ca

optimize unique values

af2b52f

drewdaemon changed the title ~~[Lens] optimize duplicate last-value functions~~ [Lens] optimize more duplicate quick functions Sep 20, 2022

make sure last values doesnt touch unrelated functions

16c083c

drewdaemon mentioned this pull request Sep 20, 2022

[Lens] optimize duplicate metric operations #140764

Closed

1 task

drewdaemon changed the title ~~[Lens] optimize more duplicate quick functions~~ [Lens] optimize duplicate formula functions Sep 20, 2022

perform terms order-by updates

10a0ab7

drewdaemon added 2 commits September 21, 2022 13:20

port cardinality deduplication to central location

d9174ab

use central groupby for metrics

2cfbf3b

kibanamachine and others added 3 commits September 21, 2022 18:37

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

f57224b

test for dedupe_aggs

906eb3e

Merge branch '135265/optimize-duplicate-last-value-functions' of gith…

38e0ce3

…ub.com:andrewctate/kibana into 135265/optimize-duplicate-last-value-functions

drewdaemon marked this pull request as ready for review September 22, 2022 02:15

drewdaemon requested review from a team as code owners September 22, 2022 02:15

drewdaemon added the release_note:skip Skip the PR/issue when compiling release notes label Sep 22, 2022

flash1293 reviewed Sep 22, 2022

View reviewed changes

dej611 mentioned this pull request Sep 22, 2022

[Lens] Improve performance for large formulas #141456

Merged

9 tasks

drewdaemon added 2 commits September 22, 2022 10:24

Merge branch 'main' of github.com:elastic/kibana into 135265/optimize…

7e104be

…-duplicate-last-value-functions

consolidate key generation logic

68cc171

drewdaemon requested a review from flash1293 September 23, 2022 15:56

remove circ dep

5304126

kibanamachine and others added 2 commits September 23, 2022 12:47

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

5337c53

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

845822f

flash1293 reviewed Sep 26, 2022

View reviewed changes

x-pack/plugins/lens/public/indexpattern_datasource/operations/definitions/last_value.tsx Show resolved Hide resolved

Merge branch 'main' into 135265/optimize-duplicate-last-value-functions

a8e560d

flash1293 approved these changes Sep 26, 2022

View reviewed changes

lukasolson approved these changes Sep 26, 2022

View reviewed changes

drewdaemon merged commit d1498ac into elastic:main Sep 26, 2022

kibanamachine added v8.6.0 backport:skip This commit does not require backporting labels Sep 26, 2022

drewdaemon mentioned this pull request Nov 29, 2022

[Lens] Math operation with time shift can lead to error when changing the aggregation order #146200

Closed

dej611 mentioned this pull request Mar 24, 2023

[Lens] [EsAggs][Meta] Optimize esaggs requests when possible #153629

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lens] optimize duplicate formula functions #140859

[Lens] optimize duplicate formula functions #140859

drewdaemon commented Sep 15, 2022 •

edited

Loading

drewdaemon commented Sep 19, 2022

drewdaemon commented Sep 21, 2022

flash1293 commented Sep 21, 2022

drewdaemon commented Sep 22, 2022

elasticmachine commented Sep 22, 2022

flash1293 left a comment

drewdaemon commented Sep 23, 2022

flash1293 left a comment

drewdaemon commented Sep 26, 2022

kibana-ci commented Sep 26, 2022

API count

ESLint disabled line counts

Total ESLint disabled count

lukasolson left a comment

[Lens] optimize duplicate formula functions #140859

[Lens] optimize duplicate formula functions #140859

Conversation

drewdaemon commented Sep 15, 2022 • edited Loading

Summary

Testing

Checklist

drewdaemon commented Sep 19, 2022

drewdaemon commented Sep 21, 2022

flash1293 commented Sep 21, 2022

drewdaemon commented Sep 22, 2022

elasticmachine commented Sep 22, 2022

flash1293 left a comment

Choose a reason for hiding this comment

drewdaemon commented Sep 23, 2022

flash1293 left a comment

Choose a reason for hiding this comment

drewdaemon commented Sep 26, 2022

kibana-ci commented Sep 26, 2022

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

API count

ESLint disabled line counts

Total ESLint disabled count

History

lukasolson left a comment

Choose a reason for hiding this comment

drewdaemon commented Sep 15, 2022 •

edited

Loading