Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy-lifecycle: add HTTP Server with endpoints for proxy lifecycle shutdown #115

Merged
merged 33 commits into from
Jun 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
937d893
wip: sketching out proxy lifecycle HTTP server
mikemorris May 10, 2023
722f263
wip: minimal proxy lifecycle management server skeleton for handling …
mikemorris May 10, 2023
2af6219
wip: get lifecycle_test.go compiling and not panicing
mikemorris May 15, 2023
a1c21c9
wip: remove metrics server boilerplate
mikemorris May 18, 2023
68f206d
wip: lifecycle method cleanup, rename httpGetter to httpClient, add P…
mikemorris May 18, 2023
892392d
envoy: set drain time and strategy passthrough ctonfig with sensible …
mikemorris May 23, 2023
bb0f87a
wip: implement gracefulShutdown method for /graceful_shutdown endpoint
mikemorris May 23, 2023
cde897a
wip: clean up some log messages and fix lifecycle server port init
mikemorris May 23, 2023
471a087
wip: uncomment basic graceful shutdown unit test
mikemorris May 23, 2023
5b54f12
fixup mock client comment
mikemorris May 23, 2023
2852040
lifecycle: get blocking gracefulShutdown working
mikemorris May 30, 2023
bbb3785
fixup: set default graceful shutdown path on lifecycle manager config…
mikemorris May 30, 2023
c7e8f86
fixup comment describing where tests should exist?
mikemorris May 30, 2023
ae041fc
make linter happy
mikemorris May 30, 2023
52e5fd5
lifecycle: wire up lifecycle mgmt server into consul-dataplane main c…
mikemorris May 30, 2023
095aaf0
pkg/envoy: add Drain and Quit methods, rename Stop to Kill
mikemorris May 30, 2023
2b0f0ee
pkg/consuldp: gracefully shutdown Envoy is xDS or lifecycle mgmt serv…
mikemorris May 30, 2023
bf9acdb
fixup expected/actual inversion
mikemorris May 30, 2023
9833553
ci: disable parallelism in unit tests to avoid port conflicts
mikemorris May 30, 2023
f0dfd78
add TODOs
mikemorris May 30, 2023
7f9b0f0
pkg/envoy: add http client to dial Envoy admin interface
mikemorris May 30, 2023
8c8141c
lifecycle: replace http client with proxy manager interface and mock
mikemorris May 30, 2023
f98ce24
test/lifecycle: pick an available port if gracefulPort is unspecified
mikemorris May 30, 2023
91a5b81
lifecycle: check errors and close errorExitCh if any problems gracefu…
mikemorris May 30, 2023
bfea751
fixup graceful shutdown path to be /graceful_shutdown
mikemorris May 30, 2023
aadfeed
update log messages and comments from code review suggestions
mikemorris May 31, 2023
496d196
pkg/consuldp: break when consul-dataplane Envoy configuration is foun…
mikemorris May 31, 2023
4340c2f
cmd: change -envoy-drain-time flag to -envoy-drain-time-seconds
mikemorris May 31, 2023
52b4557
pkg/lifecycle: wrap graceful shutdown path config printf in logger
mikemorris Jun 6, 2023
21595f0
Apply suggestions from code review
mikemorris Jun 6, 2023
b5e3aea
rename -shutdown-grace-period to -shutdown-grace-period-seconds
mikemorris Jun 6, 2023
bf8f0c8
finish renaming to shutdownDrainListenersEnabled and ShutdownDrainLis…
mikemorris Jun 6, 2023
790881e
add changelong
mikemorris Jun 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .changelog/115.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:feature
Add HTTP server with configurable port and endpoint path for initiating graceful shutdown.
```
2 changes: 1 addition & 1 deletion .github/workflows/consul-dataplane-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
- uses: actions/setup-go@4d34df0c2316fe8122ab82dc22947d607c0c91f9 # v4.0.0
with:
go-version: ${{ needs.get-go-version.outputs.go-version }}
- run: go test ./...
- run: go test ./... -p 1 # disable parallelism to avoid port conflicts from default metrics and lifecycle server configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of impact does this have on the runtime for these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to clean this up, but skipped for now in the interest of expediency. It didn't feel substantial enough to warrant the effort at this time, as the full suite still completes in under a minute.

integration-tests:
name: integration-tests
needs:
Expand Down
16 changes: 11 additions & 5 deletions cmd/consul-dataplane/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,13 @@ var (
promScrapePath string
promMergePort int

adminBindAddr string
adminBindPort int
readyBindAddr string
readyBindPort int
envoyConcurrency int
adminBindAddr string
adminBindPort int
readyBindAddr string
readyBindPort int
envoyConcurrency int
envoyDrainTimeSeconds int
envoyDrainStrategy string

xdsBindAddr string
xdsBindPort int
Expand Down Expand Up @@ -131,6 +133,8 @@ func init() {
StringVar(&readyBindAddr, "envoy-ready-bind-address", "", "DP_ENVOY_READY_BIND_ADDRESS", "The address on which Envoy's readiness probe is available.")
IntVar(&readyBindPort, "envoy-ready-bind-port", 0, "DP_ENVOY_READY_BIND_PORT", "The port on which Envoy's readiness probe is available.")
IntVar(&envoyConcurrency, "envoy-concurrency", 2, "DP_ENVOY_CONCURRENCY", "The number of worker threads that Envoy uses.")
IntVar(&envoyDrainTimeSeconds, "envoy-drain-time-seconds", 30, "DP_ENVOY_DRAIN_TIME", "The time in seconds for which Envoy will drain connections.")
StringVar(&envoyDrainStrategy, "envoy-drain-strategy", "immediate", "DP_ENVOY_DRAIN_STRATEGY", "The behaviour of Envoy during the drain sequence. Determines whether all open connections should be encouraged to drain immediately or to increase the percentage gradually as the drain time elapses.")

StringVar(&xdsBindAddr, "xds-bind-addr", "127.0.0.1", "DP_XDS_BIND_ADDR", "The address on which the Envoy xDS server is available.")
IntVar(&xdsBindPort, "xds-bind-port", 0, "DP_XDS_BIND_PORT", "The port on which the Envoy xDS server is available.")
Expand Down Expand Up @@ -235,6 +239,8 @@ func main() {
ReadyBindAddress: readyBindAddr,
ReadyBindPort: readyBindPort,
EnvoyConcurrency: envoyConcurrency,
EnvoyDrainTimeSeconds: envoyDrainTimeSeconds,
EnvoyDrainStrategy: envoyDrainStrategy,
ShutdownDrainListenersEnabled: shutdownDrainListenersEnabled,
ShutdownGracePeriodSeconds: shutdownGracePeriodSeconds,
GracefulShutdownPath: gracefulShutdownPath,
Expand Down
15 changes: 14 additions & 1 deletion pkg/consuldp/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,20 @@ type EnvoyConfig struct {
ReadyBindPort int
// EnvoyConcurrency is the envoy concurrency https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-concurrency
EnvoyConcurrency int
// ShutdownDrainListenersEnabled configures whether to wait for all proxy listeners to drain before terminating the proxy container.
// EnvoyDrainTime is the time in seconds for which Envoy will drain connections
// during a hot restart, when listeners are modified or removed via LDS, or when
// initiated manually via a request to the Envoy admin API.
// The Envoy HTTP connection manager filter will add “Connection: close” to HTTP1
// requests, send HTTP2 GOAWAY, and terminate connections on request completion
// (after the delayed close period).
// https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-drain-time-s
EnvoyDrainTimeSeconds int
// EnvoyDrainStrategy is the behaviour of Envoy during the drain sequence.
// Determines whether all open connections should be encouraged to drain
// immediately or to increase the percentage gradually as the drain time elapses.
// https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-drain-strategy
EnvoyDrainStrategy string
// ShutdownDrainListenersEnabled configures whether to start draining proxy listeners before terminating the proxy container. Drain time defaults to the value of ShutdownGracePeriodSeconds, but may be set explicitly with EnvoyDrainTimeSeconds.
ShutdownDrainListenersEnabled bool
// ShutdownGracePeriodSeconds is the amount of time to wait after receiving a SIGTERM before terminating the proxy container.
ShutdownGracePeriodSeconds int
Expand Down
59 changes: 47 additions & 12 deletions pkg/consuldp/consul_dataplane.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"context"
"errors"
"fmt"
"io"
"net"
"net/http"
"strings"
Expand All @@ -31,8 +32,9 @@ type xdsServer struct {
exitedCh chan struct{}
}

type httpGetter interface {
type httpClient interface {
Get(string) (*http.Response, error)
Post(string, string, io.Reader) (*http.Response, error)
}

// ConsulDataplane represents the consul-dataplane process
Expand All @@ -44,6 +46,7 @@ type ConsulDataplane struct {
xdsServer *xdsServer
aclToken string
metricsConfig *metricsConfig
lifecycleConfig *lifecycleConfig
}

// NewConsulDP creates a new instance of ConsulDataplane
Expand Down Expand Up @@ -209,6 +212,12 @@ func (cdp *ConsulDataplane) Run(ctx context.Context) error {
return err
}

cdp.lifecycleConfig = NewLifecycleConfig(cdp.cfg, proxy)
err = cdp.lifecycleConfig.startLifecycleManager(ctx)
if err != nil {
return err
}

doneCh := make(chan error)
go func() {
select {
Expand All @@ -217,12 +226,25 @@ func (cdp *ConsulDataplane) Run(ctx context.Context) error {
case <-proxy.Exited():
doneCh <- errors.New("envoy proxy exited unexpectedly")
case <-cdp.xdsServerExited():
if err := proxy.Stop(); err != nil {
cdp.logger.Error("failed to stop proxy", "error", err)
// Initiate graceful shutdown of Envoy, kill if error
if err := proxy.Quit(); err != nil {
cdp.logger.Error("failed to stop proxy, will attempt to kill", "error", err)
if err := proxy.Kill(); err != nil {
cdp.logger.Error("failed to kill proxy", "error", err)
}
}
doneCh <- errors.New("xDS server exited unexpectedly")
case <-cdp.metricsConfig.metricsServerExited():
doneCh <- errors.New("metrics server exited unexpectedly")
case <-cdp.lifecycleConfig.lifecycleServerExited():
// Initiate graceful shutdown of Envoy, kill if error
if err := proxy.Quit(); err != nil {
cdp.logger.Error("failed to stop proxy", "error", err)
if err := proxy.Kill(); err != nil {
cdp.logger.Error("failed to kill proxy", "error", err)
}
}
doneCh <- errors.New("proxy lifecycle management server exited unexpectedly")
}
}()
return <-doneCh
Expand Down Expand Up @@ -250,20 +272,33 @@ func (cdp *ConsulDataplane) startDNSProxy(ctx context.Context) error {
}

func (cdp *ConsulDataplane) envoyProxyConfig(cfg []byte) envoy.ProxyConfig {
setConcurrency := true
extraArgs := cdp.cfg.Envoy.ExtraArgs
// Users could set the concurrency as an extra args. Take that as priority for best ux
// experience.
for _, v := range extraArgs {
if v == "--concurrency" {
setConcurrency = false
}

envoyArgs := map[string]interface{}{
"--concurrency": cdp.cfg.Envoy.EnvoyConcurrency,
"--drain-time-s": cdp.cfg.Envoy.EnvoyDrainTimeSeconds,
"--drain-strategy": cdp.cfg.Envoy.EnvoyDrainStrategy,
}
if setConcurrency {
extraArgs = append(extraArgs, fmt.Sprintf("--concurrency %v", cdp.cfg.Envoy.EnvoyConcurrency))

// Users could set the Envoy concurrency, drain time, or drain strategy as
// extra args. Prioritize values set in that way over passthrough or defaults
// from consul-dataplane.
for envoyArg, cdpEnvoyValue := range envoyArgs {
for _, v := range extraArgs {
// If found in extraArgs, skip setting value from consul-dataplane Envoy
// config
if v == envoyArg {
break
}
}

// If not found, append value from consul-dataplane Envoy config to extraArgs
extraArgs = append(extraArgs, fmt.Sprintf("%s %v", envoyArg, cdpEnvoyValue))
}

return envoy.ProxyConfig{
AdminAddr: cdp.cfg.Envoy.AdminBindAddress,
AdminBindPort: cdp.cfg.Envoy.AdminBindPort,
Comment on lines +300 to +301
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why these are showing up in this PR

Copy link
Contributor Author

@mikemorris mikemorris May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt reasonable to pull these out of the consul-dataplane Envoy config at this point when creating the config to pass as the only argument into envoy.NewProxy (happy to change if they're already somewhere else I didn't notice) to make them accessible within pkg/envoy/proxy.go where they're needed for the HTTP calls to Envoy's admin API for the Drain() and Quit() methods.

This was not needed previously, as the Envoy process was just terminated with a process kill signal.

Logger: cdp.logger,
LogJSON: cdp.cfg.Logging.LogJSON,
BootstrapConfig: cfg,
Expand Down
195 changes: 195 additions & 0 deletions pkg/consuldp/lifecycle.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: MPL-2.0

package consuldp

import (
"context"
"fmt"
"net/http"
"strconv"
"sync"
"time"

"github.com/hashicorp/go-hclog"

"github.com/hashicorp/consul-dataplane/pkg/envoy"
)

const (
// defaultLifecycleBindPort is the port which will serve the proxy lifecycle HTTP
// endpoints on the loopback interface.
defaultLifecycleBindPort = "20300"
cdpLifecycleBindAddr = "127.0.0.1"
cdpLifecycleUrl = "http://" + cdpLifecycleBindAddr

defaultLifecycleShutdownPath = "/graceful_shutdown"
)

// lifecycleConfig handles all configuration related to managing the Envoy proxy
// lifecycle, including exposing management controls via an HTTP server.
type lifecycleConfig struct {
logger hclog.Logger

// consuldp proxy lifecycle management config
shutdownDrainListenersEnabled bool
shutdownGracePeriodSeconds int
gracefulPort int
gracefulShutdownPath string

// manager for controlling the Envoy proxy process
proxy envoy.ProxyManager

// consuldp proxy lifecycle management server
lifecycleServer *http.Server

// consuldp proxy lifecycle server control
errorExitCh chan struct{}
running bool
mu sync.Mutex
}

func NewLifecycleConfig(cfg *Config, proxy envoy.ProxyManager) *lifecycleConfig {
return &lifecycleConfig{
shutdownDrainListenersEnabled: cfg.Envoy.ShutdownDrainListenersEnabled,
shutdownGracePeriodSeconds: cfg.Envoy.ShutdownGracePeriodSeconds,
gracefulPort: cfg.Envoy.GracefulPort,
gracefulShutdownPath: cfg.Envoy.GracefulShutdownPath,

proxy: proxy,

errorExitCh: make(chan struct{}, 1),
mikemorris marked this conversation as resolved.
Show resolved Hide resolved
mu: sync.Mutex{},
}
}

func (m *lifecycleConfig) startLifecycleManager(ctx context.Context) error {
m.mu.Lock()
defer m.mu.Unlock()
if m.running {
return nil
}

m.logger = hclog.FromContext(ctx).Named("lifecycle")
m.running = true
go func() {
<-ctx.Done()
m.stopLifecycleServer()
}()

// Start the server which will expose HTTP endpoints for proxy lifecycle
// management control
mux := http.NewServeMux()

// Determine what HTTP endpoint paths to configure for the proxy lifecycle
// management server. These can be set as flags.
cdpLifecycleShutdownPath := defaultLifecycleShutdownPath
if m.gracefulShutdownPath != "" {
cdpLifecycleShutdownPath = m.gracefulShutdownPath
}

// Set config to allow introspection of default path for testing
m.gracefulShutdownPath = cdpLifecycleShutdownPath

m.logger.Info(fmt.Sprintf("setting graceful shutdown path: %s\n", cdpLifecycleShutdownPath))
mux.HandleFunc(cdpLifecycleShutdownPath, m.gracefulShutdown)

// Determine what the proxy lifecycle management server bind port is. It can be
// set as a flag.
cdpLifecycleBindPort := defaultLifecycleBindPort
if m.gracefulPort != 0 {
cdpLifecycleBindPort = strconv.Itoa(m.gracefulPort)
}
m.lifecycleServer = &http.Server{
Addr: fmt.Sprintf("%s:%s", cdpLifecycleBindAddr, cdpLifecycleBindPort),
Handler: mux,
}

// Start the proxy lifecycle management server
go m.startLifecycleServer()

return nil
}

// startLifecycleServer starts the main proxy lifecycle management server that
// exposes HTTP endpoints for proxy lifecycle control.
func (m *lifecycleConfig) startLifecycleServer() {
m.logger.Info("starting proxy lifecycle management server", "address", m.lifecycleServer.Addr)
err := m.lifecycleServer.ListenAndServe()
if err != nil && err != http.ErrServerClosed {
m.logger.Error("failed to serve proxy lifecycle management requests", "error", err)
close(m.errorExitCh)
}
}

// stopLifecycleServer stops the consul dataplane proxy lifecycle server
func (m *lifecycleConfig) stopLifecycleServer() {
m.mu.Lock()
defer m.mu.Unlock()
m.running = false

if m.lifecycleServer != nil {
m.logger.Info("stopping the lifecycle management server")
err := m.lifecycleServer.Close()
if err != nil {
m.logger.Warn("error while closing lifecycle server", "error", err)
close(m.errorExitCh)
}
}
}

// lifecycleServerExited is used to signal that the lifecycle server
// recieved a signal to initiate shutdown.
func (m *lifecycleConfig) lifecycleServerExited() <-chan struct{} {
return m.errorExitCh
}

// gracefulShutdown blocks until shutdownGracePeriodSeconds seconds have elapsed, and, if
// configured, will drain inbound connections to Envoy listeners during that time.
func (m *lifecycleConfig) gracefulShutdown(rw http.ResponseWriter, _ *http.Request) {
pglass marked this conversation as resolved.
Show resolved Hide resolved
m.logger.Info("initiating shutdown")

// Create a context that will signal a cancel at the specified duration.
// TODO: should this use lifecycleManager ctx instead of context.Background?
timeout := time.Duration(m.shutdownGracePeriodSeconds) * time.Second
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()

m.logger.Info(fmt.Sprintf("waiting %d seconds before terminating dataplane proxy", m.shutdownGracePeriodSeconds))

var wg sync.WaitGroup
wg.Add(1)

go func() {
defer wg.Done()

// If shutdownDrainListenersEnabled, initiatie graceful shutdown of Envoy.
// We want to start draining connections from inbound listeners if
// configured, but still allow outbound traffic until gracefulShutdownPeriod
// has elapsed to facilitate a graceful application shutdown.
if m.shutdownDrainListenersEnabled {
err := m.proxy.Drain()
if err != nil {
m.logger.Warn("error while draining Envoy listeners", "error", err)
close(m.errorExitCh)
}
}

// Block until context timeout has elapsed
<-ctx.Done()

// Finish graceful shutdown, quit Envoy proxy
m.logger.Info("shutdown grace period timeout reached")
err := m.proxy.Quit()
if err != nil {
m.logger.Warn("error while shutting down Envoy", "error", err)
close(m.errorExitCh)
}
}()

// Wait for context timeout to elapse
wg.Wait()
pglass marked this conversation as resolved.
Show resolved Hide resolved

// Return HTTP 200 Success
rw.WriteHeader(http.StatusOK)
}
Loading