-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/retest |
Same but namespace scoped
It looks like this brings in the 99th tail on large lists significantly, at a tradeoff of slightly higher latency on small lists. We could potentially tune this at a threshold higher than 16KB - for instance 32KB or 128KB, which would potentially reduce tail latency on lists. @kubernetes/sig-scalability-pr-reviews |
/retest |
/cc @wojtek-t |
26d202d
to
4ed2b98
Compare
/retest |
1 similar comment
/retest |
Nodes in particular is 30% below the p99 of perf-dash right now.
Normal
Lower |
There is significant variance there, but I agree the results look very promising. I would like to also run a test on larger scale, once we get out of the last regressions we have: #79096 |
The variance before was all measured where we had the bucketing problem I believe. I would trade some P99 on large requests in cluster for a dramatically reduced P99 outside the cluster. We don't have an easy way to measure that unless we simulate constrained bandwidth for a client. |
That sounds reasonable to me - I would just like to know how much we are trading (if really this is visible) , I'm definitely NOT saying "we can't do this if it grows at all". |
/test pull-kubernetes-e2e-gce-100-performance |
Fortunately this is easy to test - we just gate it on or off. All clients are automatically requesting. We can also tune up. |
Are we ready to try the larger run test now that the other blocker was resolved? |
I asked @krzysied to patch this to the experiments he is running offline, the plan was to run something over night - will get back to you when I know if that happened or not. |
We don't have full results because we were running some other experiments during that test so it didn't finish. But what we've seen looked good enough so that I'm fine with this PR from scalability POV. @smarterclayton - do you want me to review the code too (I can do that in the next 2 days). |
Yes please. Jordan signed off on the KEP, and seeing the variance over time is the biggest factor. This is ready to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple nits - other than that lgtm.
} | ||
|
||
// make a best effort to write the object if a failure is detected | ||
utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that it was JSON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it's moved from a different place, but if I'm not missing something it would make sense to fix this comment if you're touching this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If encode fails, the response is always JSON. The assumption being that encode will not fail (and thus exit early) above. If encode fails, we just spit out what we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - that makes sense now
if len(encoding) == 0 { | ||
return "" | ||
} | ||
if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to check it as a first thing in this method (to avoid unnecessary work otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it this way because the feature gate check is probably more expensive than the map lookup. But it could be the other way if you think it's easier to read (I think it would perform slightly worse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinion - we can change that later too if needed.
@@ -0,0 +1,303 @@ | |||
/* | |||
Copyright 2016 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2019
req: &http.Request{Header: http.Header{}}, | ||
wantCode: http.StatusBadRequest, | ||
wantHeaders: http.Header{"Content-Type": []string{"application/json"}}, | ||
wantBody: smallPayload, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why for BadRequest we return an object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm missing something, because you do this in couple other tests below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's emulating a custom registry endpoint (aggregated api) that returns a valid object but with bad request, which is technically possible. I wanted to have a test that captured behavior that is infrequently used (error code with valid object), but which can show up in some cases (if you had a custom response type that implemented APIStatus, you could return an object + an error code today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation - that makes a lot of sense to me now.
statusCode: http.StatusOK, | ||
wantCode: http.StatusNotFound, | ||
wantHeaders: http.Header{"Content-Type": []string{"text/plain"}}, | ||
wantBody: []byte("NotFound: \"test\" not found"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore, but it seems we have some additional whitespace somewhere in the stack...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually was testing the exact output. The two spaces separates an empty resource string (same message as before this PR).
}, | ||
|
||
{ | ||
name: "errors are compressed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one nit (s/2014/2019) and I don't want to block this PR on it.
/lgtm
} | ||
|
||
// make a best effort to write the object if a failure is detected | ||
utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - that makes sense now
if len(encoding) == 0 { | ||
return "" | ||
} | ||
if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinion - we can change that later too if needed.
req: &http.Request{Header: http.Header{}}, | ||
wantCode: http.StatusBadRequest, | ||
wantHeaders: http.Header{"Content-Type": []string{"application/json"}}, | ||
wantBody: smallPayload, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation - that makes a lot of sense to me now.
Scratch that, it looks like there's intersections with the replica set / deployment change yesterday, and the runs are newer. |
Even taking the RS/DS change into account, I'd say we're within bounds on p99 (maybe a bit more variability across runs). Will continue to monitor today and tomorrow, but looks like a small win in some flows and no huge impact elsewhere. |
New: func() interface{} { | ||
gw, err := gzip.NewWriterLevel(nil, defaultGzipContentEncodingLevel) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should an error be returned here ?
In deferredResponseWriter#Write, the error can be returned to caller.
See #79943
I think that on the scale of 100-nodes, the change is small enough that it's not easy to say what happened. When looking into 5k-node scale, things like node-listing, regressed visibly (because those are called from within the cluster, actually even via localhost, where network throughput and latency is not a problem): So I would say that it's a reasonable tradeoff to take. |
I'm wondering whether there is an additional heuristic we could add on the client side to suppress gzip encoding when making requests to localhost. It's kind of a weak heuristic though. |
We could also have a way to have clients bypass compression when run in certain modes (i.e. kcm and scheduler disabling it) |
yeah - that is exactly what I was thinking about - add an option to client config to disable compression, default it to false (I mean enable compression by default) and disable it only in scheduler and kcm (which I believe would be enough). |
Also see kubernetes/website#30639 |
The previous HTTP compression implementation functioned as an HTTP filter, which required it to deal with a number of special cases that complicated the implementation and prevented it from ever being turned on by default.
Instead, when we write an API object to a response, handle only the one case of a valid Kube object being encoded to the output. This will allow a more limited implementation that does not impact other code flows and is easier to reason about, as well as promote this to beta.
Because Golang clients request gzipping by default, and gzip has a significant CPU cost on small requests, ignore requests to compress objects that are smaller than 128KB in size. The goal of this feature is to reduce bandwidth and latency requirements on large lists, even with chunking, and 128KB is smaller than a 500 pod page but larger than almost any single object request.
Also fixes a bug introduced in #50342 because httpResponseWriterWithInit.Write wasn't a pointer receiver - the init code was called repeatedly:
/kind bug
KEP kubernetes/enhancements#1115