Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] optimize duplicate formula functions #140859

Conversation

drewdaemon
Copy link
Contributor

@drewdaemon drewdaemon commented Sep 15, 2022

Summary

Part of #135265

Optimizes semantically-duplicate functions of the following types

  • average
  • count
  • last_value
  • max
  • median
  • min
  • standard_deviation
  • sum
  • unique_count

Left to do

  • percentile rank
  • filtered percentiles

This PR also improves how we update order-by references from terms aggregations for percentile operations. Instead of anticipating the optimization in the toEsAggsFn method on the terms operation class, we update the order-bys as part of the AST transformations themselves in the optimizeEsAggs method on the percentile operation class.

Testing

Create a visualization and add a dimension with this formula

median(bytes) + median(bytes) +

sum(bytes) + sum(bytes) +

max(hour_of_day) + max(hour_of_day) +

average(bytes) + average(bytes) +

standard_deviation(bytes) + standard_deviation(bytes) +

min(machine.ram, shift='1h') + min(machine.ram, shift='1h') +

min(machine.ram, shift='2h') + min(machine.ram, shift='2h')

Check the Elasticsearch request in the inspector. There should only be 7 aggregations, not 14.

Then try this formula

sum(bytes) + sum(bytes) +

sum(bytes, kql='geo.dest: "GA" ') + sum(bytes, kql='geo.dest: "GA" ') +

sum(bytes, kql='geo.dest: "AL" ') + sum(bytes, kql='geo.dest: "AL" ') 

+ sum(bytes, lucene='geo.dest: "AL" ') + 
sum(bytes, lucene='geo.dest: "AL" ') + 

sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m') + sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m')

Checklist

Delete any items that are not applicable to this PR.

@drewdaemon drewdaemon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Sep 15, 2022
@drewdaemon
Copy link
Contributor Author

@elasticmachine merge upstream

@drewdaemon drewdaemon force-pushed the 135265/optimize-duplicate-last-value-functions branch from 1819a6a to 6c8414c Compare September 19, 2022 23:34
@drewdaemon drewdaemon changed the title [Lens] optimize duplicate last-value functions [Lens] optimize more duplicate quick functions Sep 20, 2022
@drewdaemon drewdaemon changed the title [Lens] optimize more duplicate quick functions [Lens] optimize duplicate formula functions Sep 20, 2022
@drewdaemon
Copy link
Contributor Author

@flash1293 Thinking more about how much flexibility to give the operation classes for optimizing...

All the simple “dedupe” optimizations are turning out to follow the same set of steps

  • group duplicates
  • remove all but one agg from each group
  • update the idMap to map the single agg to all the original columns
  • update terms agg order-by references

The only step here that is really specific to an operation type is deciding which aggs are duplicate.

So, we could extend the operation class with a method called something like getGroupByKey(agg) and have the datasource's to_expression take care of the rest. We could leave the optimizeEsAggs method in place for more complicated optimization scenarios such as we do with the percentiles. That way, it’s a lot cheaper to take advantage of the most common optimization, but there’s still flexibility.

Any thoughts?

@flash1293
Copy link
Contributor

@andrewctate This makes sense to me

@drewdaemon
Copy link
Contributor Author

@elasticmachine merge upstream

@drewdaemon drewdaemon marked this pull request as ready for review September 22, 2022 02:15
@drewdaemon drewdaemon requested review from a team as code owners September 22, 2022 02:15
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors)

@drewdaemon drewdaemon added the release_note:skip Skip the PR/issue when compiling release notes label Sep 22, 2022
Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well but the getGroupByKey implementations are very similar across operations. Could we unify this further (either as a helper function importing in every operation or by having the operation just state the arguments that need to be checked)

@drewdaemon
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good to me, found one edge case that could be handled better. I tested around and this is a really nice optimization.

I think it will become more relevant with the planned formula features like conditionals, but even right now it's already super nice:
Screenshot 2022-09-26 at 11 30 01
Screenshot 2022-09-26 at 11 31 16

Both of these just need to fetch data once now 🎉

@drewdaemon
Copy link
Contributor Author

@kibanamachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Rules, Alerts and Exceptions ResponseOps Cypress Tests on Security Solution / Alerts detection rules table auto-refresh should disable auto refresh when any rule selected and enable it after rules unselected

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
lens 906 908 +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
data 2506 2508 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
lens 1.2MB 1.2MB +2.2KB
Unknown metric groups

API count

id before after diff
data 3209 3211 +2

ESLint disabled line counts

id before after diff
lens 25 26 +1

Total ESLint disabled count

id before after diff
lens 28 29 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@lukasolson lukasolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppServices changes LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Lens release_note:skip Skip the PR/issue when compiling release notes Team:Visualizations Visualization editors, elastic-charts and infrastructure v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants