Skip to content

Commit

Permalink
add support for SRV discovery for permissions-api host
Browse files Browse the repository at this point in the history
To better support failover to other regions without adding load balancer hops and latency,
permissions hosts can now support SRV record discovery to discover additional hosts which can serve requests.

SRV records are looked up host configured for the permissions client.
The SRV service looked up is `permissions-api` with protocol `tcp`.
An example SRV lookup request would be for `_permissions-api._tcp.iam.example.com`.
Where `iam.example.com` is the host configured for `permissions.host`.

For best backwards compatibility, these SRV records are optional and will fallback to using the value provided in `permissions.host`.

Additionally, to support retrying on failure, the permissions client was updated to support retrying auth checks if the response was not successful.
This ensures a seamless transition when a host has failed between health checks.

Signed-off-by: Mike Mason <[email protected]>
  • Loading branch information
mikemrm committed Dec 3, 2024
1 parent abb6932 commit 0ee2806
Show file tree
Hide file tree
Showing 17 changed files with 2,521 additions and 7 deletions.
13 changes: 13 additions & 0 deletions chart/iam-runtime-infratographer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,19 @@ iam-runtime-infratographer:
| config.events.nats.url | string | `""` | url NATS server url to use. |
| config.jwt.issuer | string | `""` | issuer Issuer to use for JWT validation. |
| config.jwt.jwksURI | string | `""` | jwksURI JWKS URI to use for JWT validation. |
| config.permissions.discovery.check.concurrency | int | `5` | concurrency is the number of hosts to concurrently check. |
| config.permissions.discovery.check.count | int | `5` | count is the number of checks to run on each host to check for connection latency. |
| config.permissions.discovery.check.delay | string | `"200ms"` | delay is the delay between requests for a host. |
| config.permissions.discovery.check.interval | string | `"1m"` | interval is how frequent to check for healthiness on hosts. |
| config.permissions.discovery.check.path | string | `"/readyz"` | path is the uri path to fetch to check if host is healthy. |
| config.permissions.discovery.check.scheme | string | `""` | scheme sets the uri scheme. Default is http unless discovered port is 443 in which https will be used. |
| config.permissions.discovery.check.timeout | string | `"2s"` | timeout sets the maximum amount of time a request can wait before canceling the request. |
| config.permissions.discovery.disable | bool | `false` | disable SRV discovery. |
| config.permissions.discovery.fallback | string | `""` | fallback sets the fallback address if no hosts are found or all hosts are unhealthy. |
| config.permissions.discovery.interval | string | `"15m"` | interval to check for new SRV records. |
| config.permissions.discovery.optional | bool | `true` | optional allows SRV records to be optional. |
| config.permissions.discovery.prefer | string | `""` | prefer sets the preferred SRV record. (skips priority, weight and duration ordering) |
| config.permissions.discovery.quick | bool | `false` | quick doesn't wait for discovery and health checks to complete before selecting a host. |
| config.permissions.host | string | `""` | host permissions-api host to use. |
| config.tracing.enabled | bool | `false` | enabled initializes otel tracing. |
| config.tracing.insecure | bool | `false` | insecure if TLS should be disabled. |
Expand Down
29 changes: 29 additions & 0 deletions chart/iam-runtime-infratographer/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,35 @@ config:
permissions:
# -- host permissions-api host to use.
host: ""

discovery:
# -- disable SRV discovery.
disable: false
# -- interval to check for new SRV records.
interval: 15m
# -- quick doesn't wait for discovery and health checks to complete before selecting a host.
quick: false
# -- optional allows SRV records to be optional.
optional: true
# -- prefer sets the preferred SRV record. (skips priority, weight and duration ordering)
prefer: ""
# -- fallback sets the fallback address if no hosts are found or all hosts are unhealthy.
fallback: ""
check:
# -- scheme sets the uri scheme. Default is http unless discovered port is 443 in which https will be used.
scheme: ""
# -- path is the uri path to fetch to check if host is healthy.
path: /readyz
# -- count is the number of checks to run on each host to check for connection latency.
count: 5
# -- interval is how frequent to check for healthiness on hosts.
interval: 1m
# -- delay is the delay between requests for a host.
delay: 200ms
# -- timeout sets the maximum amount of time a request can wait before canceling the request.
timeout: 2s
# -- concurrency is the number of hosts to concurrently check.
concurrency: 5
events:
# -- enabled enables NATS event-based functions.
enabled: false
Expand Down
15 changes: 15 additions & 0 deletions config.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@ server:
socketpath: /tmp/runtime.sock
permissions:
host: permissions-api.enterprise.dev
discovery:
disable: false
interval: 15m
quick: false
optional: true
prefer: ""
fallback: ""
check:
scheme: ""
path: /readyz
count: 5
interval: 1m
delay: 200ms
timeout: 2s
concurrency: 5
jwt:
jwksuri: https://identity-api.enterprise.dev/jwks.json
issuer: https://identity-api.enterprise.dev/
Expand Down
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ require (
github.com/MicahParks/keyfunc/v3 v3.3.3
github.com/go-jose/go-jose/v4 v4.0.4
github.com/golang-jwt/jwt/v5 v5.2.1
github.com/hashicorp/go-retryablehttp v0.7.7
github.com/labstack/echo/v4 v4.12.0
github.com/metal-toolbox/iam-runtime v0.4.1
github.com/spf13/cobra v1.8.1
Expand Down Expand Up @@ -40,6 +41,7 @@ require (
github.com/golang-jwt/jwt v3.2.2+incompatible // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jaevor/go-nanoid v1.4.0 // indirect
Expand Down
8 changes: 8 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSs
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/fatih/color v1.16.0 h1:zmkK9Ngbjj+K0yRhTVONQh1p/HknKYSlNT+vZCzyokM=
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=
github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
Expand All @@ -36,6 +38,12 @@ github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0 h1:bkypFPDjIYGfCYD5mRBvpqxfYX1YCS1PXdKYWi8FsN0=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0/go.mod h1:P+Lt/0by1T8bfcF3z737NnSbmxQAppXMRziHUxPOC8k=
github.com/hashicorp/go-cleanhttp v0.5.2 h1:035FKYIWjmULyFRBKPs8TBQoi0x6d9G4xc9neXJWAZQ=
github.com/hashicorp/go-cleanhttp v0.5.2/go.mod h1:kO/YDlP8L1346E6Sodw+PrpBSV4/SoxCXGY6BqNFT48=
github.com/hashicorp/go-hclog v1.6.3 h1:Qr2kF+eVWjTiYmU7Y31tYlP1h0q/X3Nl3tPGdaB11/k=
github.com/hashicorp/go-hclog v1.6.3/go.mod h1:W4Qnvbt70Wk/zYJryRzDRU/4r0kIg0PVHBcfoyhpF5M=
github.com/hashicorp/go-retryablehttp v0.7.7 h1:C8hUCYzor8PIfXHa4UrZkU4VvK8o9ISHxT2Q8+VepXU=
github.com/hashicorp/go-retryablehttp v0.7.7/go.mod h1:pkQpWZeYWskR+D1tR2O5OcBFOxfA7DoAO6xtkuQnHTk=
github.com/hashicorp/hcl v1.0.0 h1:0Anlzjpi4vEasTeNFn2mLJgTSwt0+6sfsiTG8qcWGx4=
github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
Expand Down
22 changes: 15 additions & 7 deletions internal/permissions/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import (
"net/url"
"time"

"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"github.com/hashicorp/go-retryablehttp"
"go.infratographer.com/iam-runtime-infratographer/internal/selecthost"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
Expand Down Expand Up @@ -54,7 +55,7 @@ type Client interface {

type client struct {
apiURL string
httpClient *http.Client
httpClient *retryablehttp.Client
tracer trace.Tracer
logger *zap.SugaredLogger
}
Expand All @@ -67,17 +68,24 @@ func NewClient(config Config, logger *zap.SugaredLogger) (Client, error) {
return nil, err
}

tracer := otel.GetTracerProvider().Tracer(tracerName)
transport, err := config.initTransport(http.DefaultTransport, selecthost.Logger(logger))
if err != nil {
return nil, err
}

httpClient := retryablehttp.NewClient()

httpClient := &http.Client{
httpClient.RetryWaitMin = 100 * time.Millisecond
httpClient.RetryWaitMax = 2 * time.Second
httpClient.HTTPClient = &http.Client{
Timeout: clientTimeout,
Transport: otelhttp.NewTransport(http.DefaultTransport),
Transport: transport,
}

out := &client{
apiURL: apiURLString,
httpClient: httpClient,
tracer: tracer,
tracer: otel.GetTracerProvider().Tracer(tracerName),
logger: logger,
}

Expand Down Expand Up @@ -118,7 +126,7 @@ func (c *client) CheckAccess(ctx context.Context, subjToken string, actions []Re
}

// Build the request to send up to permissions-api.
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.apiURL, &reqBody)
req, err := retryablehttp.NewRequestWithContext(ctx, http.MethodPost, c.apiURL, &reqBody)
if err != nil {
span.SetStatus(codes.Error, err.Error())
c.logger.Errorw("failed to create permissions-api request", "error", err)
Expand Down
157 changes: 157 additions & 0 deletions internal/permissions/config.go
Original file line number Diff line number Diff line change
@@ -1,13 +1,170 @@
package permissions

import (
"net/http"
"time"

"github.com/spf13/pflag"
"go.infratographer.com/iam-runtime-infratographer/internal/selecthost"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

// Config represents a permissions-api client configuration.
type Config struct {
// Host represents a permissions-api host to hit.
Host string

// Discovery defines the host discovery configuration.
Discovery DiscoveryConfig
}

func (c Config) initTransport(base http.RoundTripper, opts ...selecthost.Option) (http.RoundTripper, error) {
base = otelhttp.NewTransport(base)

if c.Discovery.Disable {
return base, nil
}

cOpts := []selecthost.Option{
selecthost.Fallback(c.Host),
}

discovery := c.Discovery

if discovery.Interval > 0 {
cOpts = append(cOpts, selecthost.DiscoveryInterval(discovery.Interval))
}

if discovery.Quick != nil && *discovery.Quick {
cOpts = append(cOpts, selecthost.Quick())
}

if discovery.Optional == nil || *discovery.Optional {
cOpts = append(cOpts, selecthost.Optional())
}

if discovery.Prefer != "" {
cOpts = append(cOpts, selecthost.Prefer(discovery.Prefer))
}

if discovery.Fallback != "" {
cOpts = append(cOpts, selecthost.Fallback(discovery.Fallback))
}

check := discovery.Check

if check.Scheme != "" {
cOpts = append(cOpts, selecthost.CheckScheme(check.Scheme))
}

if check.Path != "" {
cOpts = append(cOpts, selecthost.CheckPath(check.Path))
} else {
cOpts = append(cOpts, selecthost.CheckPath("/readyz"))
}

if check.Count > 0 {
cOpts = append(cOpts, selecthost.CheckCount(check.Count))
}

if check.Interval > 0 {
cOpts = append(cOpts, selecthost.CheckInterval(check.Interval))
}

if check.Delay > 0 {
cOpts = append(cOpts, selecthost.CheckDelay(check.Delay))
}

if check.Timeout > 0 {
cOpts = append(cOpts, selecthost.CheckTimeout(check.Timeout))
}

if check.Concurrency > 0 {
cOpts = append(cOpts, selecthost.CheckConcurrency(check.Concurrency))
}

selector, err := selecthost.NewSelector(c.Host, "permissions-api", "tcp", append(cOpts, opts...)...)
if err != nil {
return nil, err
}

selector.Start()

return selecthost.NewTransport(selector, base), nil
}

// DiscoveryConfig represents the host discovery configuration.
type DiscoveryConfig struct {
// Disable disables host discovery.
//
// Default: false
Disable bool

// Interval sets the frequency at which SRV records are rediscovered.
//
// Default: 15m
Interval time.Duration

// Quick ensures a quick startup, allowing for a more optimal host to be chosen after discovery has occurred.
// When Quick is enabled, the default fallback address or default host is immediately returned.
// Once the discovery process has completed, a discovered host will be selected.
//
// Default: false
Quick *bool

// Optional uses the fallback address or default host without throwing errors.
// The discovery process continues to run in the background, in the chance that SRV records are added at a later point.
//
// Default: true
Optional *bool

// Check customizes the target health checking process.
Check CheckConfig

// Prefer specifies a preferred host.
// If the host is not discovered or has an error, it will not be used.
Prefer string

// Fallback specifies a fallback host if no hosts are discovered or all hosts are currently failing.
//
// Default: [Config] Host
Fallback string
}

type CheckConfig struct {
// Scheme sets the check URI scheme.
// Default is http unless discovered host port is 443 in which scheme is th en https
Scheme string

// Path sets the request path for checks.
//
// Default: /readyz
Path string

// Count defines the number of checks to run on each endpoint.
//
// Default: 5
Count int

// Interval specifies how frequently to run checks.
//
// Default: 1m
Interval time.Duration

// Delay specifies how long to wait between subsequent checks for the same host.
//
// Default: 200ms
Delay time.Duration

// Timeout defines the maximum time an individual check request can take.
//
// Default: 2s
Timeout time.Duration

// Concurrency defines the number of hosts which may be checked simultaneously.
//
// Default: 5
Concurrency int
}

// AddFlags sets the command line flags for the permissions-api client.
Expand Down
5 changes: 5 additions & 0 deletions internal/selecthost/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Package selecthost handles host discovery via DNS SRV records, keeps track of healthy
// and selects the most optimal host for use.
//
// An HTTP [Transport] is provided which simplifies using this package with any http client.
package selecthost
Loading

0 comments on commit 0ee2806

Please sign in to comment.