Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown #115

Merged
merged 33 commits into from
Jun 6, 2023

Conversation

mikemorris
Copy link
Contributor

@mikemorris mikemorris commented May 15, 2023

Refs hashicorp/consul-k8s#536, hashicorp/consul-k8s#650

Adds a proxy lifecycle management server and starts it from the consul-dataplane main process. This server exposes an HTTP endpoint (configurable, defaulting to /graceful_shutdown on port 20300) to optionally start draining inbound (external) connections to the managed Envoy proxy, while allowing outbound requests from the application for which this proxy is acting as a sidecar to continue, up to a configurable grace period timeout, to facilitate application shutdown.

Refactors the envoy package proxy manager to introduce new states (stateDraining and stateExited) and new methods (Drain(), Quit() and renaming Stop() to Kill() to avoid confusion and better describe the actual implementation) and implement an interface to allow a mock implementation for testing the lifecycle management server in isolation.

Notes for reviewers

  • This behavior can currently only be triggered by calling the /graceful_shutdown endpoint explicitly (which will be necessary for handling job termination from a preStop hook) in the future.
  • The additional functionality added to the envoy package does not have proper test coverage yet - I'm hoping to add this as a followup by replacing the current fake-envoy implementation with a Go-based version to include an HTTP server mocking the /drain_listeners and /quitquitquit Envoy admin API endpoints.
  • Should bootstrap tests be added for the Envoy drain time and strategy passthrough configuration?
  • envoyExtraArgs refactor will likely conflict with fix: log level present in extra args using envoy-extra-args annotation : NET-2190 #133 and need to be resolved

Links

@mikemorris mikemorris force-pushed the proxy-lcm/shutdown branch from 5a3d46d to 5be431b Compare May 18, 2023 19:14
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown-http-endpoints branch from ada8d83 to 47d0424 Compare May 18, 2023 19:15
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown branch from 5be431b to 1ffb8ac Compare May 23, 2023 18:44
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown-http-endpoints branch 2 times, most recently from a5bd187 to a6e6f0a Compare May 23, 2023 18:46
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown-http-endpoints branch from e824175 to 52c214d Compare May 30, 2023 14:24
Comment on lines 228 to 247
case <-cdp.xdsServerExited():
if err := proxy.Stop(); err != nil {
// Initiate graceful shutdown of Envoy, kill if error
if err := proxy.Quit(); err != nil {
cdp.logger.Error("failed to stop proxy", "error", err)
if err := proxy.Kill(); err != nil {
cdp.logger.Error("failed to kill proxy", "error", err)
}
}
doneCh <- errors.New("xDS server exited unexpectedly")
case <-cdp.metricsConfig.metricsServerExited():
doneCh <- errors.New("metrics server exited unexpectedly")
case <-cdp.lifecycleConfig.lifecycleServerExited():
// Initiate graceful shutdown of Envoy, kill if error
if err := proxy.Quit(); err != nil {
cdp.logger.Error("failed to stop proxy", "error", err)
if err := proxy.Kill(); err != nil {
cdp.logger.Error("failed to kill proxy", "error", err)
}
}
doneCh <- errors.New("proxy lifecycle maangement server exited unexpectedly")
Copy link
Contributor Author

@mikemorris mikemorris May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the only change to existing logic aside from introducing the new proxy lifecycle management server exposing the /shutdown endpoint - attempting to cleanly exit the Envoy process if a subprocess exits unexpectedly instead of the prior hard kill.

@mikemorris mikemorris marked this pull request as ready for review May 30, 2023 16:24
@mikemorris mikemorris requested a review from a team as a code owner May 30, 2023 16:24
@mikemorris mikemorris requested review from a team, erichaberkorn, curtbushko and pglass May 30, 2023 16:25
url := fmt.Sprintf("http://127.0.0.1:%d%s", port, m.gracefulShutdownPath)
log.Printf("sending request to %s\n", url)

resp, err := http.Get(url)
Copy link
Contributor Author

@mikemorris mikemorris May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly add some way to check that the lifecycle management server blocks for c.shutdownGracePeriod seconds here before calling Proxy.Quit(), but not sure how to best implement that in a non-flaky way.

@mikemorris mikemorris changed the title pkg/consuldp: add HTTP Server with endpoints for proxy lifecycle shutdown proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown May 30, 2023
@@ -32,7 +32,7 @@ jobs:
- uses: actions/setup-go@4d34df0c2316fe8122ab82dc22947d607c0c91f9 # v4.0.0
with:
go-version: ${{ needs.get-go-version.outputs.go-version }}
- run: go test ./...
- run: go test ./... -p 1 # disable parallelism to avoid port conflicts from default metrics and lifecycle server configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of impact does this have on the runtime for these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to clean this up, but skipped for now in the interest of expediency. It didn't feel substantial enough to warrant the effort at this time, as the full suite still completes in under a minute.

pkg/consuldp/config.go Outdated Show resolved Hide resolved
pkg/consuldp/consul_dataplane.go Outdated Show resolved Hide resolved
pkg/consuldp/consul_dataplane.go Outdated Show resolved Hide resolved
pkg/consuldp/consul_dataplane.go Outdated Show resolved Hide resolved
pkg/consuldp/consul_dataplane.go Outdated Show resolved Hide resolved
Comment on lines +304 to +301
AdminAddr: cdp.cfg.Envoy.AdminBindAddress,
AdminBindPort: cdp.cfg.Envoy.AdminBindPort,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why these are showing up in this PR

Copy link
Contributor Author

@mikemorris mikemorris May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt reasonable to pull these out of the consul-dataplane Envoy config at this point when creating the config to pass as the only argument into envoy.NewProxy (happy to change if they're already somewhere else I didn't notice) to make them accessible within pkg/envoy/proxy.go where they're needed for the HTTP calls to Envoy's admin API for the Drain() and Quit() methods.

This was not needed previously, as the Envoy process was just terminated with a process kill signal.

Copy link

@pglass pglass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and this is super useful! I added some nits and non-blocking questions.

pkg/consuldp/lifecycle.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Show resolved Hide resolved
pkg/consuldp/lifecycle.go Show resolved Hide resolved
pkg/consuldp/lifecycle.go Show resolved Hide resolved
Copy link
Contributor

@curtbushko curtbushko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! I can tell that you put a lot of thought and effort into this!

I only have a few comments, but nothing that I will stop you on.

Note: There is going to be another release of consul-dataplane tomorrow (Thursday, June 1st) so I am going to block here so this doesn't get merged in by accident.

cmd/consul-dataplane/main.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Outdated Show resolved Hide resolved
pkg/consuldp/lifecycle.go Show resolved Hide resolved
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown branch from 1ffb8ac to 04e73f2 Compare June 6, 2023 11:15
@mikemorris mikemorris force-pushed the proxy-lcm/shutdown-http-endpoints branch 2 times, most recently from d548340 to 8bc0b14 Compare June 6, 2023 12:37
@mikemorris
Copy link
Contributor Author

mikemorris commented Jun 6, 2023

Addressed feedback and pushed some naming changes up into #100, which I'd like to merge before this to get the chain started.

@mikemorris mikemorris force-pushed the proxy-lcm/shutdown-http-endpoints branch from 268e173 to bf8f0c8 Compare June 6, 2023 19:18
@mikemorris mikemorris merged commit 0047e65 into main Jun 6, 2023
@mikemorris mikemorris deleted the proxy-lcm/shutdown-http-endpoints branch June 6, 2023 19:50
DanStough added a commit that referenced this pull request Jun 14, 2023
DanStough added a commit that referenced this pull request Jun 14, 2023
* Revert "proxy-lifecycle: catch SIGTERM and initiate graceful shutdown (#130)"

This reverts commit 40c99dc.

* Revert "proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown (#115)"

This reverts commit 0047e65.

* Revert "cmd: add CLI flags for proxy shutdown lifecycle management (#100)"

This reverts commit 3d37b9f.
@curtbushko curtbushko self-assigned this Jun 27, 2023
@curtbushko curtbushko added backport/1.0 backport/1.1 Changes are backported to 1.1 labels Jun 27, 2023
@curtbushko curtbushko added backport/1.0 backport/1.1 Changes are backported to 1.1 and removed backport/1.0 backport/1.1 Changes are backported to 1.1 labels Jun 27, 2023
curtbushko pushed a commit that referenced this pull request Jun 27, 2023
…hutdown (#115)

* envoy: set drain time and strategy passthrough config with sensible defaults for sidecar proxy shutdown lifecycle

* pkg/consuldp/lifecycle: wire up lifecycle mgmt server into consul-dataplane main config

* pkg/envoy: add Drain and Quit methods, rename Stop to Kill

* pkg/consuldp/lifecycle: gracefully shutdown Envoy if xDS or lifecycle mgmt server exits unexpectedly

* ci: disable parallelism in unit tests to avoid port conflicts

* pkg/envoy: add http client to dial Envoy admin interface

* pkg/consuldp/lifecycle: replace http client with proxy manager interface and mock

* test/lifecycle: pick an available port if gracefulPort is unspecified

* pkg/consuldp/lifecycle: check errors and close errorExitCh if any problems gracefully shutting down Envoy

* add changelong

---------

Co-authored-by: Nathan Coleman <[email protected]>
Co-authored-by: Paul Glass <[email protected]>
curtbushko added a commit that referenced this pull request Jun 27, 2023
… lifecycle shutdown into release/1.0.x (#182)

* proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown (#115)

* envoy: set drain time and strategy passthrough config with sensible defaults for sidecar proxy shutdown lifecycle

* pkg/consuldp/lifecycle: wire up lifecycle mgmt server into consul-dataplane main config

* pkg/envoy: add Drain and Quit methods, rename Stop to Kill

* pkg/consuldp/lifecycle: gracefully shutdown Envoy if xDS or lifecycle mgmt server exits unexpectedly

* ci: disable parallelism in unit tests to avoid port conflicts

* pkg/envoy: add http client to dial Envoy admin interface

* pkg/consuldp/lifecycle: replace http client with proxy manager interface and mock

* test/lifecycle: pick an available port if gracefulPort is unspecified

* pkg/consuldp/lifecycle: check errors and close errorExitCh if any problems gracefully shutting down Envoy

* add changelong

---------

Co-authored-by: Nathan Coleman <[email protected]>
Co-authored-by: Paul Glass <[email protected]>

* fix automation

* fix unit tests

---------

Co-authored-by: Mike Morris <[email protected]>
Co-authored-by: Nathan Coleman <[email protected]>
Co-authored-by: Paul Glass <[email protected]>
wilkermichael pushed a commit that referenced this pull request Jun 28, 2023
… lifecycle shutdown into release/1.0.x (#182)

* proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown (#115)

* envoy: set drain time and strategy passthrough config with sensible defaults for sidecar proxy shutdown lifecycle

* pkg/consuldp/lifecycle: wire up lifecycle mgmt server into consul-dataplane main config

* pkg/envoy: add Drain and Quit methods, rename Stop to Kill

* pkg/consuldp/lifecycle: gracefully shutdown Envoy if xDS or lifecycle mgmt server exits unexpectedly

* ci: disable parallelism in unit tests to avoid port conflicts

* pkg/envoy: add http client to dial Envoy admin interface

* pkg/consuldp/lifecycle: replace http client with proxy manager interface and mock

* test/lifecycle: pick an available port if gracefulPort is unspecified

* pkg/consuldp/lifecycle: check errors and close errorExitCh if any problems gracefully shutting down Envoy

* add changelong

---------

Co-authored-by: Nathan Coleman <[email protected]>
Co-authored-by: Paul Glass <[email protected]>

* fix automation

* fix unit tests

---------

Co-authored-by: Mike Morris <[email protected]>
Co-authored-by: Nathan Coleman <[email protected]>
Co-authored-by: Paul Glass <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1 Changes are backported to 1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants