Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] optimize duplicate metric operations #140764

Conversation

drewdaemon
Copy link
Contributor

@drewdaemon drewdaemon commented Sep 14, 2022

Summary

Part of #135265

Testing

Create a visualization and add a dimension with this formula

median(bytes) + median(bytes) +

sum(bytes) + sum(bytes) +

max(hour_of_day) + max(hour_of_day) +

average(bytes) + average(bytes) +

standard_deviation(bytes) + standard_deviation(bytes) +

min(machine.ram, shift='1h') + min(machine.ram, shift='1h') +

min(machine.ram, shift='2h') + min(machine.ram, shift='2h')

Check the Elasticsearch request in the inspector. There should only be 7 aggregations, not 14.

Then try this formula

sum(bytes) + sum(bytes) +

sum(bytes, kql='geo.dest: "GA" ') + sum(bytes, kql='geo.dest: "GA" ') +

sum(bytes, kql='geo.dest: "AL" ') + sum(bytes, kql='geo.dest: "AL" ') 

+ sum(bytes, lucene='geo.dest: "AL" ') + 
sum(bytes, lucene='geo.dest: "AL" ') + 

sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m') + sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m')

Check the Elasticsearch request in the inspector. There should only be 5 aggregations, not 10.

Checklist

Delete any items that are not applicable to this PR.

@drewdaemon drewdaemon changed the title [Lens] collapse duplicate formula aggs [Lens] collapse duplicate metric aggs Sep 14, 2022
@drewdaemon drewdaemon changed the title [Lens] collapse duplicate metric aggs [Lens] optimize duplicate metric operations Sep 14, 2022
add automated test
@drewdaemon drewdaemon force-pushed the 135265/optimize-redundant-formula-functions branch from 6d8f8c3 to 8988e38 Compare September 14, 2022 20:31
@drewdaemon drewdaemon marked this pull request as ready for review September 15, 2022 03:16
@drewdaemon drewdaemon requested a review from a team as a code owner September 15, 2022 03:16
@drewdaemon drewdaemon added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Sep 15, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors)

@dej611
Copy link
Contributor

dej611 commented Sep 15, 2022

There's something strange happening somewhere here as I've noticed this weird behaviour:

  • using your formula I have an error about the standard_deviation operation missing:

Screenshot 2022-09-15 at 18 04 32

  • if I remove the standard_deviation then the formula works:

Screenshot 2022-09-15 at 18 04 41

  • but also using standard_deviation only it works:

Screenshot 2022-09-15 at 18 04 51

Cannot replicate the same behaviour in main, so there's probably some string manipulation here which breaks it.

@drewdaemon
Copy link
Contributor Author

@dej611 good find! 😬 I'll look into this.

@drewdaemon
Copy link
Contributor Author

@dej611 I'm having trouble reproducing that issue. I copied your formula, but it appears to work. Could you share any more details?

The error must originate here, but I haven't figured out how to get it to throw

return bucket[agg.id].std_deviation;

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works pretty well, left a comment about a further opportunity to optimize but that can be split out. I also don't know how Marco caused this error, but it would be nice to reproduce. @dej611 if you can still cause it can be share your config? Maybe it's about a certain combination with bucket aggs?

@drewdaemon drewdaemon requested a review from a team as a code owner September 19, 2022 18:50
@drewdaemon
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
data 2429 2431 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
lens 1.2MB 1.2MB +1.5KB
Unknown metric groups

API count

id before after diff
data 3132 3134 +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same problem the percentile optimization had - if a column is referenced for sorting by top values, it's failing because the column won't exist. In this case Filter 1 and Filter 2 are both a sum of bytes filtered the same way.

Ranking by Filter 2 works fine byt ranking by Filter 1 breaks because that column got optimized away.
Screenshot 2022-09-20 at 12 14 20

This is a constructed example, but the same thing can happen if there is a regular metric dimension used for ranking plus a formula dimension which is internally using the same metric on the same chart.

@drewdaemon
Copy link
Contributor Author

Closing in favor of a more holistic PR: #140859

@drewdaemon drewdaemon closed this Sep 20, 2022
@dej611
Copy link
Contributor

dej611 commented Sep 21, 2022

Works pretty well, left a comment about a further opportunity to optimize but that can be split out. I also don't know how Marco caused this error, but it would be nice to reproduce. @dej611 if you can still cause it can be share your config? Maybe it's about a certain combination with bucket aggs?

I managed to reproduce it also in main the same issue, so it's a general bug. Here's my configuration:
Screenshot 2022-09-21 at 10 37 02

If I remove the date_histogram then it works ok.

Edit: just found out that the time range picked is important to reproduce the bug: set the time picker to come custom date range with no data and the error will appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Lens release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants