Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC admin: hierarchical stats viewer #18670

Closed
wants to merge 191 commits into from

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Oct 19, 2021

Commit Message: Adds default hierarchical view for admin stats accessed by clicking 'stats' from the admin home page. Note that the existing /stats API is unchanged.

Screen Shot 2022-01-12 at 12 28 52 PM

This is not ready for approval/submit yet as the new endpoints lack unit tests, and the UI (if we keep it) is ugly (unstyled). But I wanted to get some comments on the concept, particularly as it relates to solving #18675 before I clean all that up.

Also needed before this PR can be merged:

  • proper escaping of special chars embedded in scope names, wherever arbitrary text is inserted via C++ or JS.
  • When navigating within the outline, record HTML history
  • Add JS tests of some sort
  • Review Scope lifetime and make sure we don't risk referencing scopes as they are being deleted by a config update
  • When "usedonly" mode is on, find some mechanism to filter out scopes with no used stats. Same for regex filtering.
  • Break up into about 3 PRs.
  • Add option declarations for other admin pages besides stats.
  • test coverage for stats_hander.cc and admin_html_generator.cc.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
Fixes: #18675

…n't wind up inserting and then deleting most of the candidate elements.

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…xcessive number of comparisons.

Signed-off-by: Joshua Marantz <[email protected]>
… stringified stat-name, which is faster to compare than the StatName. We only need to hold the ones that are in the top-K so it won't add memory pressure beyond what we need for the page.

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #18670 was opened by jmarantz.

see: more, trace.

@jmarantz
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18670 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz jmarantz changed the title stats: mechanism to access stats by scope RFC stats: mechanism to access stats by scope Oct 19, 2021
@ggreenway
Copy link
Contributor

I think for only solving browser memory, the existing filtering on the stats endpoint would be sufficient to reduce the results size in a flexible way.

I'm assuming you took this approach because on the server side, using a regex to filter all the stats is expensive. Is that correct?

One alternative idea to allow more flexibility in which stats are grouped together: Let the user define a list of groups in their bootstrap config, and use existing filtering methods to add stats to zero or more groups (selected at stat creation time), and add an api for querying by group name.

My concern with this approach is that we're tying an external admin API to an internal code API used for managing lifetimes.

@ggreenway
Copy link
Contributor

/wait-any

@jmarantz
Copy link
Contributor Author

You are right: browser memory is solved with filtering, but there's 2 issues:

  • you might not know what to filter until you've already hit /stats and gotten yourself in trouble
  • if you are hitting the endpoint by visiting the admin-port in the browser and just clicking, there's no convenient way to know you can add the ?filter= argument. I'm thinking a bit richer UI on the main admin page might help.

And still doesn't solve server-side memory bloat (with store.counters() aggregating all of them), or the potentially 0.5 seconds you lock up the stats-allocator mutex and the main thread running through all the data, with the potential impact on p99 latency.

@ggreenway
Copy link
Contributor

you might not know what to filter until you've already hit /stats and gotten yourself in trouble

Adding another way to get stats won't solve that problem. I think the only way to solve that is to by default have a limit on the number of stats returned, with a query parameter to override.

if you are hitting the endpoint by visiting the admin-port in the browser and just clicking, there's no convenient way to know you can add the ?filter= argument. I'm thinking a bit richer UI on the main admin page might help.

Yeah, I think improving that webpage would be an easy win in that case. Have the link by default get /stats?limit=1000 or something (and add limit to the /stats handler).

And still doesn't solve server-side memory bloat (with store.counters() aggregating all of them), or the potentially 0.5 seconds you lock up the stats-allocator mutex and the main thread running through all the data, with the potential impact on p99 latency.

Agreed. I think it makes a lot of sense to have a way to reduce server resources for fetching huge numbers of stats.

jmarantz added 20 commits July 25, 2022 14:28
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
oschaaf pushed a commit to maistra/envoy that referenced this pull request Oct 26, 2022
Commit Message: Make the HTML view for stats a compile-time option, defaulting to on. This PR doesn't change any code-paths, but it moves the HTML support into a separate file, compiled into //source/server/admin:admin_lib. In the future, and admin_html_lib will be added, built only when admin-html is not disabled. That will allow envoyproxy/envoy#19546 and ultimately envoyproxy/envoy#18670 to proceed.

To disable, compile with `--define=admin_html=disabled`, in which case the admin `"/"` endpoint will just print that the thml mode was disabled.

Additional Description: This PR is a step toward solving envoyproxy/envoy#18675
Risk Level: low -- this just moves code around and doesn't change it at all.
Testing: //test/..., both with and without `--define=admin_html=disabled`
Docs Changes: included
Release Notes: included
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <[email protected]>
@ggreenway ggreenway removed their assignment Nov 8, 2022
Signed-off-by: Joshua Marantz <[email protected]>
@alyssawilk alyssawilk closed this Jan 4, 2023
@jmarantz jmarantz reopened this Feb 15, 2023
Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz
Copy link
Contributor Author

actually I'm going to close this again. Having worked hard on making filtering relatively fast, I think the flat view is probably OK. What's really needed more (based on recent experience) is to be able to sort by recency or frequency of changes, which can be done in the browser by polling json.

I'll start a new PR for that.

Fortunately the support for this will all be buried in the admin-html conditional compilation and will not affect envoy-mobile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stats: reduce browser and server memory/cpu bursts when viewing stats from admin
4 participants