-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consul/connect: implement initial support for connect native #8011
Conversation
Ember Asset Size actionAs of 881d875 Files that got Bigger 🚨:
Files that got Smaller 🎉:
Files that stayed the same size 🤷:
|
Ember Test Audit comparison
|
fb4fc0e
to
7997177
Compare
9bdb5b3
to
c84fd77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try and do a more in-depth review retro-actively, but looks good on initial skim.
api/services.go
Outdated
@@ -140,7 +140,7 @@ func (s *Service) Canonicalize(t *Task, tg *TaskGroup, job *Job) { | |||
|
|||
// ConsulConnect represents a Consul Connect jobspec stanza. | |||
type ConsulConnect struct { | |||
Native bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did Native work at all before? Just want to make sure we don't break jobfiles without good reason
} | ||
|
||
h.logger.Debug("share consul TLS configuration for connect native", "enabled", h.consulShareTLS, "task", request.Task.Name) | ||
if h.consulShareTLS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you mentioning this shouldn't be enabled in production. Is it worth dropping a warn line when the hook is created so its emitted at startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is safe to enable in production when Consul ACLs are enabled. If that's true and clearly documented, I don't think we need to log every time.
Working on
|
This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! No blockers but some stuff we should followup on someday... when isn't there though? 😅
} | ||
|
||
h.logger.Debug("share consul TLS configuration for connect native", "enabled", h.consulShareTLS, "task", request.Task.Name) | ||
if h.consulShareTLS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is safe to enable in production when Consul ACLs are enabled. If that's true and clearly documented, I don't think we need to log every time.
its Consul SSL configuration with Connect Native applications. Includes values | ||
of `ca_file`, `cert_file`, `key_file`, `auth`, `ssl`, and `verify_ssl`. Does | ||
not include the value for the ACL `token`. This option should be disabled in | ||
an untrusted environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like "untrusted environment" as we don't define what that means and are moving toward trusting individual jobs less and less.
My understanding is that as long as Consul ACLs are enabled, sharing TLS certificates does not meaningfully reduce the overall security of Nomad+Consul as the native task will only have its SI token's permissions.
Perhaps stating that this option should be disabled if Consul ACLs are disabled might be less vague and more prescriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When share_ssl
is enabled, Connect Native tasks are granted the same level of authenticity as the Nomad Client agent (i.e. sharing certificates). They are scoped down in authorization to the default Consul Service Identity [Virtual] Policy tokens - or whatever consul token is injected through env
or template
.
Turning this feature off makes sense if the security of your cluster depends entirely on authentication (you have HTTPS enabled but not ACLs).
I'll try to clarify that better.
m := make(map[string]string) | ||
|
||
if _, exists := env["CONSUL_CACERT"]; !exists && h.consulConfig.CAFile != "" { | ||
m["CONSUL_CACERT"] = filepath.Join("/secrets", secretCAFilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These paths won't work for raw_exec. Dropping the leading slash so that it's relative should work. Not sure if we have any prior art to work from here though. 🤔
We can definitely fix raw_exec later and it shouldn't hold up this PR. Just file a ticket if you think it's a concern but don't have time to verify or fix now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately just dropping the root slash breaks in docker. Will open a follow up ticket to find a way to make everything happy.
jobspec/parse_service.go
Outdated
@@ -47,6 +47,7 @@ func parseService(o *ast.ObjectItem) (*api.Service, error) { | |||
"address_mode", | |||
"check_restart", | |||
"connect", | |||
"task", // todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is done?
return fmt.Errorf("Consul Connect sidecars require exactly 1 network, found %d in group %q", n, g.Name) | ||
} | ||
|
||
if g.Networks[0].Mode != "bridge" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickethier will this check need to change due to CNI?
nomad/structs/services.go
Outdated
|
||
// if service is connect native, service task must be set | ||
if s.Connect.IsNative() && len(s.TaskName) == 0 { | ||
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is connect native and requires setting the task", s.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We title case Connect elsewhere in errors.
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is connect native and requires setting the task", s.Name)) | |
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is Connect Native and requires setting the task", s.Name)) |
@@ -328,7 +328,7 @@ dashes (`-`) are converted to underscores (`_`) in environment variables so | |||
|
|||
- The `consul` binary must be present in Nomad's `$PATH` to run the Envoy | |||
proxy sidecar on client nodes. | |||
- Consul Connect Native is not yet supported ([#6083][gh6083]). | |||
- Consul Connect Native requires host networking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Let's file a followup for non-host-networking Native to track comments and votes even if we haven't roadmapped it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory if you configure Consul with a unix socket addr listener, volume mount that into your task, and point CONSUL_HTTP_ADDR at it (using a capable library)... bridge mode should Just Work. Not sure if we'd want to recommend that pattern though. I'll file the followup ticket for more discussion.
the service definition must provide the name of the implementing task in the | ||
[task][service_task] field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't require this if there's only 1 task.
### Limitations | ||
|
||
[Consul Connect Native services][native] and [Nomad variable | ||
interpolation][interpolation] are _not_ yet supported. | ||
[Nomad variable interpolation][interpolation] is _not_ yet supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue link
A quick demo of using Vault policies to generate Policies$ cat consul-agent-policy.hcl
acl = "write"
node_prefix "" {
policy = "write"
}
agent_prefix "" {
policy = "write"
}
catalog_prefix "" {
policy = "write"
}
operator_prefix "" {
polic = "write"
}
$ cat nomad-agent-policy.hcl
acl = "write"
service_prefix "" {
policy = "write"
}
agent_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "read"
}
$ cat vault-server-policy.hcl
acl = "write"
service_prefix "" {
policy = "write"
}
key_prefix "" {
policy = "write"
}
agent_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "read"
}
$ cat si-policy.hcl
path "consul/creds/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
$ cat si-open.hcl
service_prefix "" {
policy = "write"
}
node_prefix "" {
policy = "read"
}
Consul Configlog_level = "TRACE"
data_dir = "/mnt/big/hashiconfigs/connect-native-tls/consul2"
server = true
bootstrap_expect = 1
advertise_addr = "127.0.0.1"
bind_addr = "0.0.0.0"
addresses {
http = "127.0.0.1"
}
ports {
http = 8500
}
connect {
enabled = true
}
acl {
enabled = true
default_policy = "deny"
down_policy = "extend-cache"
enable_token_persistence = true
} Nomad Configlog_level = "INFO"
data_dir = "/tmp/nomad2"
server {
enabled = true
bootstrap_expect = 1
}
# needs public interface
bind_addr = "0.0.0.0"
advertise {
http = "127.0.0.1:4646"
rpc = "127.0.0.1:4647"
serf = "127.0.0.1:4648"
}
client {
enabled = true
}
vault {
enabled = true
address = "http://localhost:8200"
token = "s.9x1EgSXnB964mDQ0OwjPkONO"
}
consul {
ssl = false
address = "127.0.0.1:8500"
token = "e9c27530-1155-0158-5906-9ff480721512"
} Vault Config# -dev mode
disable_mlock = true
api_addr = "http://127.0.0.1:8200"
storage "inmem" {
} Vault Consul Secrets Backend$ vault secrets enable consul
Success! Enabled the consul secrets engine at: consul/
$ vault write consul/roles/si-open policies=si-open-policy
Success! Data written to: consul/roles/si-open
$ vault write consul/config/access scheme=http address=127.0.0.1:8500 token=a8db31da-3224-7f30-0eba-b3a099806269
Success! Data written to: consul/config/access Nomad Job File# v.nomad
job "v" {
datacenters = ["dc1"]
group "api" {
network {
mode = "host"
port "uuidapi" {
// let nomad pick
}
}
service {
name = "uuid-api"
port = "${NOMAD_PORT_uuidapi}"
task = "apitask"
connect {
native = true
}
}
task "apitask" {
driver = "docker"
config {
image = "shoenig/uuid-api:v0.0.5"
network_mode = "host"
}
env {
PORT = "${NOMAD_PORT_uuidapi}"
BIND = "0.0.0.0"
}
vault {
policies = ["si-policy"]
}
template {
data = <<EOH
CONSUL_HTTP_TOKEN="{{with secret "consul/creds/si-open"}}{{.Data.token}}{{end}}"
EOH
destination = "secrets/consul_token.env"
env = true
}
}
}
group "frontend" {
network {
mode = "host"
port "http" {
static = 9800
}
}
service {
name = "uuid-fe"
port = "9800"
task = "frontend"
connect {
native = true
}
}
task "frontend" {
driver = "docker"
config {
image = "shoenig/uuid-fe:v0.0.5"
network_mode = "host"
}
env {
UPSTREAM = "uuid-api"
BIND = "0.0.0.0"
PORT = "9800"
}
vault {
policies = ["si-policy"]
}
template {
data = <<EOH
CONSUL_HTTP_TOKEN="{{with secret "consul/creds/si-open"}}{{.Data.token}}{{end}}"
EOH
destination = "secrets/consul_token.env"
env = true
}
}
}
} Launch$ nomad job run v.nomad
==> Monitoring evaluation "9c393e5b"
Evaluation triggered by job "v"
Evaluation within deployment: "91a99e2c"
Allocation "0ef82d75" created: node "890e1db8", group "frontend"
Allocation "35b0e79c" created: node "890e1db8", group "api"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "9c393e5b" finished with status "complete"
$ nomad job status v
ID = v
Name = v
Submit Date = 2020-06-24T08:28:49-05:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
api 0 0 1 0 0 0
frontend 0 0 1 0 0 0
Latest Deployment
ID = 91a99e2c
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
api 1 1 1 0 2020-06-24T08:38:59-05:00
frontend 1 1 1 0 2020-06-24T08:39:02-05:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
0ef82d75 890e1db8 frontend 0 run running 3m26s ago 3m13s ago
35b0e79c 890e1db8 api 0 run running 3m26s ago 3m15s ago Verify Token is from our Vault Policy$ nomad alloc exec 0ef8 /usr/bin/env | grep CONSUL_HTTP_TOKEN
CONSUL_HTTP_TOKEN=b33638e5-0443-f407-a66a-79a99248e1c5
$ CONSUL_HTTP_TOKEN=b33638e5-0443-f407-a66a-79a99248e1c5 consul acl token read -self
AccessorID: e5629c0e-51ef-e19b-ee51-bea0cb4f9f5b
SecretID: b33638e5-0443-f407-a66a-79a99248e1c5
Description: Vault si-open token-0ef82d75-ea09-5a19-d4f1-11bcdb049af5-frontend 1593005329545929801
Local: false
Create Time: 2020-06-24 08:28:49.546430511 -0500 CDT
Policies:
2c68431f-dbb5-155e-820c-6c28b51515f3 - si-open-policy Connect Intention Exists$ envy exec ctls:consul2 consul intention check uuid-fe uuid-api
Allowed Connect Working$ curl localhost:9800
<html>
<head>
</head>
<body>
<center>uuid: 283838da-7b82-cd25-b757-d21f31231d65</center>
</body>
</html> |
Will follow up with issues for
|
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.
The
connect
stanza now includes anative
parameter, which can beset to the name of task that backs the Connect Native Consul service.
There is a new Client configuration parameter for the
consul
stanzacalled
share_ssl
. Likeallow_unauthenticated
the default value istrue, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does not include auth or token information.
If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the
CONSUL_HTTP_TOKEN
environment variable.
Any of the automatically set environment variables can be overridden by
the Connect Native task using the
env
stanza.Fixes #6083