-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/contour: Envoy Shutdown Manager #2227
Conversation
22b091e
to
fd372ff
Compare
Codecov Report
@@ Coverage Diff @@
## master #2227 +/- ##
==========================================
- Coverage 78.24% 77.35% -0.89%
==========================================
Files 57 58 +1
Lines 5070 5154 +84
==========================================
+ Hits 3967 3987 +20
- Misses 1017 1080 +63
- Partials 86 87 +1
Continue to review full report at Codecov.
|
I'm thinking of adding a flow diagram to the docs to explain the sequence of events and the options. |
I think a flow diagram is an excellent idea. |
c81adb9
to
38f157d
Compare
I'm reviewing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I particularly liked how you bundled the examples and documentation changes in this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with a couple of questions. I think the docs are great, nice work.
38f157d
to
10f21da
Compare
I think that’s technically v0.9.1 upstream, but for whatever reason go modules doesn’t want to use that version number and is reverting to the hash.
… On 19 Feb 2020, at 11:57 am, James Peach ***@***.***> wrote:
@jpeach commented on this pull request.
In go.mod:
> @@ -19,6 +19,7 @@ require (
github.com/konsorten/go-windows-terminal-sequences v1.0.2 // indirect
github.com/prometheus/client_golang v1.1.0
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4
+ github.com/prometheus/common v0.6.0
Yeh, there doesn't seem to be any version numbering consistency across these packages 🤷♂
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits around log message formatting and so forth. I'd like a bit more clarity around how we should handle errors posting to the health check fail URL though.
cmd/contour/shutdownmanager.go
Outdated
envoyAdminURL := fmt.Sprintf("http://%s:%d/healthcheck/fail", s.envoyHost, s.envoyPort) | ||
|
||
// Send shutdown signal to Envoy to start draining connections | ||
err := shutdownEnvoy(envoyAdminURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevesloka Did we reach a resolution here? What is meant to happen to the shutdown process if we couldn't fail the healthcheck out?
internal/metrics/parser.go
Outdated
const prometheusStat = "envoy_http_downstream_cx_active" | ||
|
||
func prometheusLabels() []string { | ||
return []string{"ingress_http", "ingress_https"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use ENVOY_HTTP_LISTENER
and ENVOY_HTTPS_LISTENER
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ENV vars? No, these strings match labels in the prometheus metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant these. But I suppose these strings are coded in so many places that one more doesn't hurt :)
Resolve this if you want to keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see. Hmm, possibly. As it's written now we'd get an import cycle by referencing those since the contour package already references the metrics one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can lift the const somewhere else. It feels bad to add more to the shutdown manager file.
@jpeach If we can't tell Envoy to start draining connections then the readiness probe will fail and the pod will wait the total number of terminationGracePeriodSeconds before dropping all the connections that Envoy. |
Signed-off-by: Steve Sloka <[email protected]>
Signed-off-by: Steve Sloka <[email protected]>
Signed-off-by: Steve Sloka <[email protected]>
@stevesloka So in that case, are we worse off than before? That is, envoy isn't draining, but the shutdown is going to take the full termination period. If we don't retry here, we are guaranteed to consume the full grace period, right? Whereas if we retry, we might succeed and at least start the draining. |
Worse off than before what? I'm not following. The current preStop hook does the exact same call and has the exact same behavior.
Yes, we'll use the entire grace period unless somehow the connections all drain by themselves.
I'd prefer to make this a new issue to follow up on. I honestly do not see this as a case that would be hit. I do think it is possible, but I'd prefer to not make this PR hold on this. Thoughts? |
The current preStop calls the fail once and then pod shutdown continues. This preStop blocks pod shutdown until the connection count converges. Previously, the preStop would never hold up pod shutdown.
Sure, that seems fine. |
10f21da
to
9b98457
Compare
Retry issue: #2262 |
Signed-off-by: Steve Sloka <[email protected]>
9b98457
to
572fdc2
Compare
args: | ||
- envoy | ||
- shutdown-manager | ||
image: docker.io/projectcontour/contour:master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be versioned or :latest
Fixes #145 by adding a new set of commands to Contour which will watch the Envoy Prometheus endpoint to block the pod from terminating while there are open connections.
Also adds a sample Grafana panel which shows the open connections by listener: