slow stats endpoint and blocking /ready endpoint when hitting /stats endpoint #16425

jfrabaute · 2021-05-11T00:18:08Z

I might probably report 2 distinct problems:

1/ /stats?format={json|prometheus} slowness
2/ /ready endpoint blocked by /stats endpoint when running at the same time.

Maybe I should create 2 issues, but starting with this one to have feedback from the envoy team.

So, we have some envoy instances with a lot of mappings/big config (between 5000 and 10000).
In order to reproduce it, I did create a sample envoy config with 7000 mappings, so it's easy to test.

Problem 1:

With this amount of mapping, the /stats endpoints takes a bit more than 1 second.
The /stats?format={json|prometheus} (either json or prometheus format) is talking more than 6 seconds!
That is more than 5x between /stats and the 2 other ones.
Is that expected?

Problem 2:

When /stats is called, and another client is doing a /ready request, the /ready request seems to be stuck until the /stats request is done.
So, when the /stats?format=prometheus is called and takes 7 seconds (for example), and a /ready request is done by a client at the beginning of this 7 seconds time, the /ready request is also going to take 7 seconds.
It generates a bunch of problems with monitoring, especially because we are using ambassador and ambassardor /check_ready endpoint is a wrapper around the envoy /ready endpoint and it has a 2 seconds timeout (https://github.com/datawire/ambassador/blob/d1a8b1ca89d878b4c8722f51f2479028288b747e/pkg/acp/envoy.go#L61 )

So, if prometheus is scraping at the same time, the readiness is failing.

Repro steps:

(all attachments have an extra .txt extension that should be removed)
I am attaching the "sample-long.yaml" config with the 7000 mappings.
I am also attaching the sample-long.sh" basic bash script to generate the sample config (if needed).
Here is the command I'm using to run envoy locally:

docker run --rm --network=host \
    -v $(pwd)/sample-long.yaml:/sample.yaml \
    -ti envoyproxy/envoy:v1.18.2 \
    --config-path /sample.yaml

I'm attaching the test.sh script to run to test to perf of the different endpoints.

Here is the output of this test.sh script when running on my laptop. My laptop does nothing, envoy does nothing (no traffic except the tests).

> ./test.sh
*********************
*** TEST ready endpoint (it's fast)
0.01user 0.00system 0:00.01elapsed 92%CPU (0avgtext+0avgdata 11860maxresident)k
0inputs+0outputs (0major+650minor)pagefaults 0swaps

*********************
*** TEST stats+ready endpoint (ready endpoint is going to be slow because it's probably locked waiting for stat endpoint)
READY timing (slow :-( ):
0.00user 0.00system 0:08.14elapsed 0%CPU (0avgtext+0avgdata 12152maxresident)k
0inputs+0outputs (0major+660minor)pagefaults 0swaps
---------------
STATS timing:
0.00user 0.02system 0:08.76elapsed 0%CPU (0avgtext+0avgdata 12116maxresident)k
0inputs+0outputs (0major+683minor)pagefaults 0swaps

*********************
*** TEST stat endpoints (basic one: around 1 second)
0.00user 0.01system 0:01.12elapsed 1%CPU (0avgtext+0avgdata 12060maxresident)k
0inputs+0outputs (0major+677minor)pagefaults 0swaps

*********************
*** TEST stat endpoints (json one: more than 5x than basic one)
0.00user 0.01system 0:04.95elapsed 0%CPU (0avgtext+0avgdata 12276maxresident)k
0inputs+0outputs (0major+680minor)pagefaults 0swaps

*********************
*** TEST stat endpoints (prometheus one: more than 5x than basic one)
0.01user 0.01system 0:09.32elapsed 0%CPU (0avgtext+0avgdata 11840maxresident)k
0inputs+0outputs (0major+672minor)pagefaults 0swaps

You can see in the output that the /ready is taking more than 8 seconds when executed at the same time as the /stats?format=prometheus.
Is that expected?

Thank you for any feedback.

sample-long.sh.txt

test.sh.txt

sample-long.yaml.txt

The text was updated successfully, but these errors were encountered:

antoniovicente · 2021-05-11T19:13:54Z

cc @htuch @jmarantz

IIRC the main thread is used for admin handlers like /stats and /ready, and also config processing from XDS. If any of those are slow, the rest will be delayed.

snowp · 2021-05-11T19:24:48Z

I've seen people recommend against hitting an admin endpoint as a readiness check for this reason (I think from @howardjohn ?). The fact that it might be delayed due to xDS responses make it a poor choice for periodic health checks.

antoniovicente · 2021-05-11T19:25:49Z

It would be good to ensure that config updates can proceed even if admin requests are consuming significant CPU.
Also, it would be good to allow multiple admin requests to make progress in parallel with each other and in parallel with config; a large config reload shouldn't block /stats scrapes and vice versa.

howardjohn · 2021-05-11T19:27:18Z

Yeah for Istio we hit this a lot. We ended up using it just for the startup (at which point stats/ is probably not huge and any large XDS load is probably Envoy actually initializing). From then on, we just hit a normal (non-admin) listener. So the requests go through Envoy's worker threads, not the main thread.

jmarantz · 2021-05-11T19:37:47Z

See also #16139 (comment)

I think if we streamed out large HTTP responses from admin, it would take less cpu/memory and, depending on how we managed events, might not block other admin requests from running during the streaming process.

jfrabaute · 2021-05-11T19:50:48Z

Thanks for the feedback.

Regarding health check, the /ready endpoint should not be used (We should put ambassador guys in the loop here probably because this is what they use, and they seem to expect /ready to be fast. Adding @kflynn at least).
So, what is the recommended health check endpoint for envoy? Using one of the endpoints exposed (so worker thread, not admin thread)?

Regarding the /stats slowness with json and prometheus format, it looks like the admin thread is synchronous and can only handle one request at a time, is that correct? So, in this case, that's why, when doing 2 requests /ready and /stats, one is waiting for the other to finish. Correct? (=It's not related to any locking...that's was wrong).
In this case, yeah, there is not much to do.

One extra question about json and prometheus formats: Is that expected that asking for those format is making the request much slower? Is the transformation step between stats format and another format that big that it makes the request 5x slower?

jmarantz · 2021-05-11T19:57:31Z

Right now each request is entirely blocking, which is fine for tiny requests like /ready. But if we change the stats handlers to stream data out, and if we don't buffer and sort, I think the admin port could service /ready in between chunks of a /stats response.

jfrabaute · 2021-05-12T00:26:43Z

It would be good to ensure that config updates can proceed even if admin requests are consuming significant CPU.
Also, it would be good to allow multiple admin requests to make progress in parallel with each other and in parallel with config; a large config reload shouldn't block /stats scrapes and vice versa.

Yes, we're hitting the scrape problem as well when config is reloaded (So that's a 3rd problem 😞 ).
So when a config is reloading, the 3 endpoints for readiness/liveness/stats are slow and they time out and things start to go bad. We did increase the timeout, but it's not really a good solution.

Are there any workaround/options here with the current admin thread design (especially for the blocked behavior during a config reload)?

antoniovicente · 2021-05-12T00:34:23Z

I don't think there is a work around. I think it is fairly difficult problem to solve since certain data structures are only safe to access from the main thread, hence the restrictions on config reload and admin handlers running on the main thread.

jmarantz · 2021-05-12T00:43:43Z

I guess there are two approaches:

try to do some of the admin work on worker threads
keep using the main thread, but stream data out asynchronously rather than buffering all of it.

A lot of the work that's being done now may not be needed by all users, such as sorting the stats output. If we we do sort, we don't need to buffer the serialized form which is happening now, we can just buffer the vector of stat pointers, or at least shared_ptr so that we can survive dropping a stat while streaming it out.

antoniovicente · 2021-05-12T00:51:29Z

Doing stats operations that take 6 seconds on workers is not acceptable either. Ideally we'ld do very little in the main thread, just the things that need to go through it due to the need for ordering. admin requests should be served by a thread pool. Config processing should happen on its own set of threads.

jmarantz · 2021-05-12T00:58:39Z

Sorry if I wasn't clear; I was proposing the second option: break up the stats streaming work and do it asynchronously on the main thread. I think we'd do a little less total work by not buffering up the full serialized form (and maybe not sorting) but that might be too optimistic. But even if it's the same 6s of work, if we do it in 100 60ms chunks rather than all at once, it would have less latency impact for /ready endpoints etc.

I'm not opposed to having other threads dedicated to admin processing. But I'd be a little worried about the potential impact on request latency at (say) p99 or p95, of having more computable threads than physical cores. We could also reserve a few cores for admin processing, depending on the setup, but that might not work well for everyone. So I thought async-chunks-on-main-thread might be a good way to go.

htuch · 2021-05-12T02:35:37Z

I think for stats it should be relatively plausible to push this to a distinct thread; given we already have stats sinks that gather this data, can't we have a pseudo-sink that dumps to some shared memory buffer for a consumer thread that can handle these requests?

This doesn't solve the more general problem of something like /healthz latency and config update interference.

jmarantz · 2021-05-12T04:06:12Z

Yeah you can collect the stats in vectors of shared_ptrs on the main thread and then stream them out from another. But I think we should stream them out in chunks regardless rather than buffering up all the text before sending a response. That will solve a memory burst problem per #16139 .

Once we are streaming, we can make it async and see whether we actually need to add a new thread.

jtway · 2021-05-12T18:35:26Z

I've seen people recommend against hitting an admin endpoint as a readiness check for this reason (I think from @howardjohn ?). The fact that it might be delayed due to xDS responses make it a poor choice for periodic health checks.

@snowp @jmarantz By the way, this is part of what #15876 is addressing, to a small degree. One of the larger issues, for us at least, was once there is a large number of VirtualHosts (> 80K), even a single VirtualHost add/delete via VHDS was causing delay of /stats, /ready, etc. This, to a large degree, seems to stem from the way RouteMatcher and VirtualHosts are handled.

Out of curiosity, @jfrabaute, are you using VHDS? Due you have many VirtualHosts?

jfrabaute · 2021-05-12T18:46:47Z

Out of curiosity, @jfrabaute, are you using VHDS? Due you have many VirtualHosts?

No idea. We're using ambassador, which is taking care of managing the envoy config and lifecycle. I don't think ambassador is using VHDS, but I'm not sure. I'll ask the ambassador team.

Regarding the problem, it looks like the /ready vs /stat problem is only one small part of the blocking problem.
The config reload seems to even be a bigger problem as it's blocking both /ready and /stats endpoints.
Is there a plan to make changes here (config reload) to make the 2 other endpoints not blocked?

jfrabaute · 2021-05-12T18:47:08Z

Answer: Ambassador does not use VHDS

jtway · 2021-05-12T18:49:14Z

Okay, good to know its not limited to VHDS. I'd be curious to see a flamegraph, but I'm just a fly on the wall here.

jmarantz · 2021-05-12T18:51:35Z

I am not aware of such a plan to allow /ready to run concurrently with config reload, but it seems like this should be possible to do also; maybe. more difficult than stats though.

jfrabaute · 2021-05-12T18:58:36Z

Could those 2 '/ready and /stats endpoint run in a secondary http server (listening on a different port)?
/ready can return stale information, or just read an atomic value.
For stats, it's probably already possible they can be read in //, if value is stale, it's not a big deal, next scraping will get the update.
This way, they are moved from the admin thread/server and can live their own life.
That's one more "control" thread to manage, but it might make sense to isolate to have them fast those as they are expected to be used by external services monitoring envoy. And it does not "touch" the admin thread, which is important, especially for the config management.

jmarantz · 2021-05-12T19:04:38Z

All that can be done! Just a matter of someone doing it.

To simplify, is the goal just to make /ready fast always, and you'd be willing to have that on a different port and its own (very cheap) thread?

That project is nice because it doesn't have to touch the complexity of stats and config-update. But it would involve plumbing through the declaration of a new port in the API. Other @envoyproxy/maintainers may have better ideas also.

snowp · 2021-05-12T19:10:45Z

Another strategy might be to expose whatever relevant information you want out of /ready via a HTTP filter, allowing users to set up a filter chain on a distinct port that can handle this. That wouldn't require any core API changes

This somewhat begs the question of what information /ready is using to declare readiness: much of it I expect is just dependent on the server starting, so anything handling traffic on a worker thread would be gated on this. Does this just boil down to a direct response from the HCM?

jfrabaute · 2021-05-12T20:58:05Z

@jmarantz:

All that can be done! Just a matter of someone doing it.

Yeah.... Totally see what you mean here.

To simplify, is the goal just to make /ready fast always, and you'd be willing to have that on a different port and its own (very cheap) thread?

That's the small goal for me, but that might just be a narrow view of a problem which is larger, hence created the issue to get an idea of a possible larger picture.
If it's most of it, or, at least for now and in the near/mid term, most of the people would benefit from having this change for /ready and /stats only, that could be a first step.

That project is nice because it doesn't have to touch the complexity of stats and config-update.

Yes, that's the idea behind my proposal. I had a quick look at the code, and knowing close to nothing, the admin thread already seem like a complex piece of code, so isolating this change and not touching the admin thread seemed like an interesting option, and it is just for monitoring could make sense to have a first iteration in a reasonable amount of time.

But it would involve plumbing through the declaration of a new port in the API. Other @envoyproxy/maintainers may have better ideas also.

@snowp

Another strategy might be to expose whatever relevant information you want out of /ready via a HTTP filter, allowing users to set up a filter chain on a distinct port that can handle this. That wouldn't require any core API changes

That seems interesting. Similar to what I'm proposing (IIUC), but even simpler as it's just getting the info and using the worker threads.
But I don't know if ambassador provides the level of flexibility where we can manipulated the http filters. I'll ask the ambassador team here. That is not related to envoy here, and ambassador could make changes to enable this tho.

moderation · 2021-05-12T23:02:43Z

I need to check if we are experiencing delays on stats scrapes but I suspect we are. We have a similar scenario but using Contour for ingress on Kubernetes as opposed to Ambassador. This is a multi-tenant cluster and can end up hosting a lot of routes, clusters, etc.

This has me thinking about some @mattklein123 tweets about push vs. pull - https://twitter.com/mattklein123/status/1328559009633239040, https://twitter.com/mattklein123/status/1266010765669961729

I plan on looking into changing the model of having Prometheus scrape /stats and instead use a push model with the StatsdSink - https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/metrics/v3/stats.proto#extension-envoy-stat-sinks-statsd

In theory this approach wouldn't see the periodic load as the scrape occurs. Keen on advice or gotcha's and whether a Statsd approach would help with blocking / contention

htuch · 2021-05-13T02:35:09Z

One thing to add to this already long and involved discussion is that we directionally want to head towards admin port not being special in any particular way vs. regular listeners. This has a ton of advantages around security and consistency. This might matter if we start adding new ports (which I don't think should be preferred; it's not great to fix an implementation issue with an API one).

jfrabaute · 2021-05-14T21:22:34Z

One question: I'm looking at http filters, and it looks like a health check http filter already exist:
https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/health_check_filter
https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking#arch-overview-health-checking-filter

Would that http filter solve the problem if used on a specific listener for the monitoring system like k8s readiness/liveness?

If so, that would be a first step (/stats is the second one).

/ready endpoint used by emissary is using the admin port (8001 by default). This generates a problem during config reloads with large configs as the admin thread is blocking so the /ready endpoint can be very slow to answer (in the order of several seconds, even more). The problem is described in this envoy issue: envoyproxy/envoy#16425 This change is trying to fix the /ready endpoint problem. The /ready endpoint can be exposed in the worker pool by adding a listener+ health check http filter. This way, the /ready endpoint is fast and it is not blocked by any config reload or blocking admin operation as it depends on the worker pool. Future changes will allow to use this endpoint with diagd and the go code as well so they get a fast /ready endpoint and they do not use the admin port. This listener is disabled by default. the config "read_port" can be used to set the port and enable this new listener on envoy. Signed-off-by: Fabrice Rabaute <[email protected]>

/ready endpoint used by emissary is using the admin port (8001 by default). This generates a problem during config reloads with large configs as the admin thread is blocking so the /ready endpoint can be very slow to answer (in the order of several seconds, even more). The problem is described in this envoy issue: envoyproxy/envoy#16425 This change is trying to fix the /ready endpoint problem. The /ready endpoint can be exposed in the worker pool by adding a listener+ health check http filter. This way, the /ready endpoint is fast and it is not blocked by any config reload or blocking admin operation as it depends on the worker pool. Future changes will allow to use this endpoint with diagd and the go code as well so they get a fast /ready endpoint and they do not use the admin port. This listener is disabled by default. the config "read_port" can be used to set the port and enable this new listener on envoy.

/ready endpoint used by emissary is using the admin port (8001 by default). This generates a problem during config reloads with large configs as the admin thread is blocking so the /ready endpoint can be very slow to answer (in the order of several seconds, even more). The problem is described in this envoy issue: envoyproxy/envoy#16425 This change is trying to fix the /ready endpoint problem. The /ready endpoint can be exposed in the worker pool by adding a listener+ health check http filter. This way, the /ready endpoint is fast and it is not blocked by any config reload or blocking admin operation as it depends on the worker pool. Future changes will allow to use this endpoint with diagd and the go code as well so they get a fast /ready endpoint and they do not use the admin port. This listener is disabled by default. the config "read_port" can be used to set the port and enable this new listener on envoy. Signed-off-by: Fabrice Rabaute <[email protected]>

/ready endpoint used by emissary is using the admin port (8001 by default). This generates a problem during config reloads with large configs as the admin thread is blocking so the /ready endpoint can be very slow to answer (in the order of several seconds, even more). The problem is described in this envoy issue: envoyproxy/envoy#16425 This change is trying to fix the /ready endpoint problem. The /ready endpoint can be exposed in the worker pool by adding a listener+ health check http filter. This way, the /ready endpoint is fast and it is not blocked by any config reload or blocking admin operation as it depends on the worker pool. Future changes will allow to use this endpoint with diagd and the go code as well so they get a fast /ready endpoint and they do not use the admin port. This listener is disabled by default. the config "read_port" can be used to set the port and enable this new listener on envoy. Co-authored-by: Lance Austin <[email protected]> Signed-off-by: Lance Austin <[email protected]> Signed-off-by: Fabrice Rabaute <[email protected]>

/ready endpoint used by emissary is using the admin port (8001 by default). This generates a problem during config reloads with large configs as the admin thread is blocking so the /ready endpoint can be very slow to answer (in the order of several seconds, even more). The problem is described in this envoy issue: envoyproxy/envoy#16425 This change is trying to fix the /ready endpoint problem. The /ready endpoint can be exposed in the worker pool by adding a listener+ health check http filter. This way, the /ready endpoint is fast and it is not blocked by any config reload or blocking admin operation as it depends on the worker pool. Future changes will allow to use this endpoint with diagd and the go code as well so they get a fast /ready endpoint and they do not use the admin port. This listener is disabled by default. the config "read_port" can be used to set the port and enable this new listener on envoy. Signed-off-by: Fabrice Rabaute <[email protected]>

jfrabaute added bug triage Issue requires triage labels May 11, 2021

antoniovicente added area/admin and removed triage Issue requires triage labels May 11, 2021

skriss mentioned this issue May 19, 2022

consider changing the implementation of the Envoy readiness probe projectcontour/contour#4540

Open

jfrabaute mentioned this issue May 29, 2022

ready: Use envoy listener to expose endpoint from worker for ambassador emissary-ingress/emissary#4253

Closed

tomasbanet mentioned this issue Jun 21, 2022

ready: Use envoy listener to expose endpoint from worker for ambassador emissary-ingress/emissary#4300

Closed

miwig mentioned this issue Apr 22, 2024

Healthchecks (#51) antonengelhardt/wasm-oidc-plugin#52

Merged

inssein mentioned this issue Jun 11, 2024

Wasm filter causes envoy admin to be unresponsive corazawaf/coraza-proxy-wasm#278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow stats endpoint and blocking /ready endpoint when hitting /stats endpoint #16425

slow stats endpoint and blocking /ready endpoint when hitting /stats endpoint #16425

jfrabaute commented May 11, 2021 •

edited

Loading

antoniovicente commented May 11, 2021 •

edited

Loading

snowp commented May 11, 2021

antoniovicente commented May 11, 2021

howardjohn commented May 11, 2021

jmarantz commented May 11, 2021

jfrabaute commented May 11, 2021

jmarantz commented May 11, 2021

jfrabaute commented May 12, 2021

antoniovicente commented May 12, 2021

jmarantz commented May 12, 2021

antoniovicente commented May 12, 2021

jmarantz commented May 12, 2021

htuch commented May 12, 2021 •

edited

Loading

jmarantz commented May 12, 2021

jtway commented May 12, 2021

jfrabaute commented May 12, 2021

jfrabaute commented May 12, 2021

jtway commented May 12, 2021

jmarantz commented May 12, 2021

jfrabaute commented May 12, 2021

jmarantz commented May 12, 2021

snowp commented May 12, 2021

jfrabaute commented May 12, 2021

moderation commented May 12, 2021 •

edited

Loading

htuch commented May 13, 2021

jfrabaute commented May 14, 2021

slow stats endpoint and blocking /ready endpoint when hitting /stats endpoint #16425

slow stats endpoint and blocking /ready endpoint when hitting /stats endpoint #16425

Comments

jfrabaute commented May 11, 2021 • edited Loading

Problem 1:

Problem 2:

Repro steps:

antoniovicente commented May 11, 2021 • edited Loading

snowp commented May 11, 2021

antoniovicente commented May 11, 2021

howardjohn commented May 11, 2021

jmarantz commented May 11, 2021

jfrabaute commented May 11, 2021

jmarantz commented May 11, 2021

jfrabaute commented May 12, 2021

antoniovicente commented May 12, 2021

jmarantz commented May 12, 2021

antoniovicente commented May 12, 2021

jmarantz commented May 12, 2021

htuch commented May 12, 2021 • edited Loading

jmarantz commented May 12, 2021

jtway commented May 12, 2021

jfrabaute commented May 12, 2021

jfrabaute commented May 12, 2021

jtway commented May 12, 2021

jmarantz commented May 12, 2021

jfrabaute commented May 12, 2021

jmarantz commented May 12, 2021

snowp commented May 12, 2021

jfrabaute commented May 12, 2021

moderation commented May 12, 2021 • edited Loading

htuch commented May 13, 2021

jfrabaute commented May 14, 2021

jfrabaute commented May 11, 2021 •

edited

Loading

antoniovicente commented May 11, 2021 •

edited

Loading

htuch commented May 12, 2021 •

edited

Loading

moderation commented May 12, 2021 •

edited

Loading