Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul/connect: implement initial support for connect native #8011

Merged
merged 7 commits into from
Jun 24, 2020

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented May 19, 2020

consul/connect: add support for running connect native tasks

This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The connect stanza now includes a native parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the consul stanza
called share_ssl. Like allow_unauthenticated the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does not include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the env stanza.

Fixes #6083

@github-actions
Copy link

github-actions bot commented Jun 5, 2020

Ember Asset Size action

As of 881d875

Files that got Bigger 🚨:

File raw gzip
auto-import-fastboot.js +296 kB +64.4 kB
nomad-ui.css +91 B +24 B

Files that got Smaller 🎉:

File raw gzip
nomad-ui.js -13.9 kB +709 B
vendor.js -363 kB -91.2 kB

Files that stayed the same size 🤷‍:

File raw gzip
vendor.css 0 B 0 B

@shoenig shoenig changed the base branch from c-use-taskkind to master June 5, 2020 15:41
@github-actions
Copy link

github-actions bot commented Jun 5, 2020

Ember Test Audit comparison

c-use-taskkind 881d875ce5d914c8cdac421702b1840f686f61cc change
passes 1311 1354 +43
failures 0 0 0
flaky 0 0 0
duration 4m 39s 243ms 5m 03s 662ms +24s 419ms

@shoenig shoenig force-pushed the f-cnative-host branch 2 times, most recently from fb4fc0e to 7997177 Compare June 8, 2020 20:55
@shoenig shoenig force-pushed the f-cnative-host branch 10 times, most recently from 9bdb5b3 to c84fd77 Compare June 17, 2020 16:12
@shoenig shoenig changed the title DO NOT MERGE: consul/connect: implement initial support for connect native consul/connect: implement initial support for connect native Jun 17, 2020
@shoenig shoenig marked this pull request as ready for review June 17, 2020 16:58
@shoenig shoenig requested review from schmichael and nickethier June 17, 2020 16:58
@shoenig shoenig added this to the 0.12.0 milestone Jun 19, 2020
@nickethier nickethier modified the milestones: 0.12.0, 0.12.0-beta1 Jun 19, 2020
Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try and do a more in-depth review retro-actively, but looks good on initial skim.

api/services.go Outdated
@@ -140,7 +140,7 @@ func (s *Service) Canonicalize(t *Task, tg *TaskGroup, job *Job) {

// ConsulConnect represents a Consul Connect jobspec stanza.
type ConsulConnect struct {
Native bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did Native work at all before? Just want to make sure we don't break jobfiles without good reason

}

h.logger.Debug("share consul TLS configuration for connect native", "enabled", h.consulShareTLS, "task", request.Task.Name)
if h.consulShareTLS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you mentioning this shouldn't be enabled in production. Is it worth dropping a warn line when the hook is created so its emitted at startup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is safe to enable in production when Consul ACLs are enabled. If that's true and clearly documented, I don't think we need to log every time.

@shoenig
Copy link
Member Author

shoenig commented Jun 19, 2020

Working on e2e and caught a bug feature

    TestE2E/ConnectACLs/*connect.ConnectACLsE2ETest/TestConnectACLsConnectNativeDemo: acls.go:310: 
        	Error Trace:	acls.go:310
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestE2E/ConnectACLs/*connect.ConnectACLsE2ETest/TestConnectACLsConnectNativeDemo
        	Messages:   	expected 1 SI token for uuid-fe: map[frontend:1 generate:1]

Looks like it's using the task name for the SI token name instead of the service name (which I think we want, to stay consistent with the proxy analog) This is working as intended - this value is the in the Token's .Description field which for these Nomad-requested SI tokens is defined to be in the form _nomad_si [<clusterID>] [<allocID>] [<taskName>], and this test should be asserting the expected <taskName> is present.

@schmichael schmichael modified the milestones: 0.12.0-beta1, 0.12.0-beta2 Jun 19, 2020
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! No blockers but some stuff we should followup on someday... when isn't there though? 😅

}

h.logger.Debug("share consul TLS configuration for connect native", "enabled", h.consulShareTLS, "task", request.Task.Name)
if h.consulShareTLS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is safe to enable in production when Consul ACLs are enabled. If that's true and clearly documented, I don't think we need to log every time.

its Consul SSL configuration with Connect Native applications. Includes values
of `ca_file`, `cert_file`, `key_file`, `auth`, `ssl`, and `verify_ssl`. Does
not include the value for the ACL `token`. This option should be disabled in
an untrusted environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "untrusted environment" as we don't define what that means and are moving toward trusting individual jobs less and less.

My understanding is that as long as Consul ACLs are enabled, sharing TLS certificates does not meaningfully reduce the overall security of Nomad+Consul as the native task will only have its SI token's permissions.

Perhaps stating that this option should be disabled if Consul ACLs are disabled might be less vague and more prescriptive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When share_ssl is enabled, Connect Native tasks are granted the same level of authenticity as the Nomad Client agent (i.e. sharing certificates). They are scoped down in authorization to the default Consul Service Identity [Virtual] Policy tokens - or whatever consul token is injected through env or template.

Turning this feature off makes sense if the security of your cluster depends entirely on authentication (you have HTTPS enabled but not ACLs).

I'll try to clarify that better.

m := make(map[string]string)

if _, exists := env["CONSUL_CACERT"]; !exists && h.consulConfig.CAFile != "" {
m["CONSUL_CACERT"] = filepath.Join("/secrets", secretCAFilename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths won't work for raw_exec. Dropping the leading slash so that it's relative should work. Not sure if we have any prior art to work from here though. 🤔

We can definitely fix raw_exec later and it shouldn't hold up this PR. Just file a ticket if you think it's a concern but don't have time to verify or fix now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately just dropping the root slash breaks in docker. Will open a follow up ticket to find a way to make everything happy.

@@ -47,6 +47,7 @@ func parseService(o *ast.ObjectItem) (*api.Service, error) {
"address_mode",
"check_restart",
"connect",
"task", // todo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is done?

return fmt.Errorf("Consul Connect sidecars require exactly 1 network, found %d in group %q", n, g.Name)
}

if g.Networks[0].Mode != "bridge" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickethier will this check need to change due to CNI?


// if service is connect native, service task must be set
if s.Connect.IsNative() && len(s.TaskName) == 0 {
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is connect native and requires setting the task", s.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We title case Connect elsewhere in errors.

Suggested change
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is connect native and requires setting the task", s.Name))
mErr.Errors = append(mErr.Errors, fmt.Errorf("Service %s is Connect Native and requires setting the task", s.Name))

@@ -328,7 +328,7 @@ dashes (`-`) are converted to underscores (`_`) in environment variables so

- The `consul` binary must be present in Nomad's `$PATH` to run the Envoy
proxy sidecar on client nodes.
- Consul Connect Native is not yet supported ([#6083][gh6083]).
- Consul Connect Native requires host networking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Let's file a followup for non-host-networking Native to track comments and votes even if we haven't roadmapped it yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory if you configure Consul with a unix socket addr listener, volume mount that into your task, and point CONSUL_HTTP_ADDR at it (using a capable library)... bridge mode should Just Work. Not sure if we'd want to recommend that pattern though. I'll file the followup ticket for more discussion.

Comment on lines +52 to +53
the service definition must provide the name of the implementing task in the
[task][service_task] field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't require this if there's only 1 task.

### Limitations

[Consul Connect Native services][native] and [Nomad variable
interpolation][interpolation] are _not_ yet supported.
[Nomad variable interpolation][interpolation] is _not_ yet supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue link

@shoenig
Copy link
Member Author

shoenig commented Jun 24, 2020

A quick demo of using Vault policies to generate CONSUL_HTTP_TOKEN

Policies
$ cat consul-agent-policy.hcl 
acl = "write"
node_prefix "" {
  policy = "write"
}
agent_prefix "" {
  policy = "write"
}
catalog_prefix "" {
  policy = "write"
}
operator_prefix "" {
  polic = "write"
}

$ cat nomad-agent-policy.hcl 
acl = "write"
service_prefix "" {
  policy = "write"
}
agent_prefix "" {
  policy = "read"
}
node_prefix "" {
  policy = "read"
}

$ cat vault-server-policy.hcl 
acl = "write"
service_prefix "" {
  policy = "write"
}
key_prefix "" {
  policy = "write"
}
agent_prefix "" {
  policy = "read"
}
node_prefix "" {
  policy = "read"
}

$ cat si-policy.hcl 
path "consul/creds/*" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

$ cat si-open.hcl 
service_prefix "" {
    policy = "write"
}
node_prefix "" {
    policy = "read"
}
Consul Config
log_level = "TRACE"

data_dir = "/mnt/big/hashiconfigs/connect-native-tls/consul2"

server = true

bootstrap_expect = 1

advertise_addr = "127.0.0.1"

bind_addr = "0.0.0.0"

addresses {
  http = "127.0.0.1"
}

ports {
  http = 8500
}

connect {
  enabled = true
}

acl {
  enabled = true
  default_policy = "deny"
  down_policy = "extend-cache"
  enable_token_persistence = true
}
Nomad Config
log_level = "INFO"

data_dir = "/tmp/nomad2"

server {
  enabled          = true
  bootstrap_expect = 1
}

# needs public interface
bind_addr = "0.0.0.0"

advertise {
  http = "127.0.0.1:4646"
  rpc  = "127.0.0.1:4647"
  serf = "127.0.0.1:4648"
}

client {
  enabled = true
}

vault {
  enabled = true
  address = "http://localhost:8200"
  token = "s.9x1EgSXnB964mDQ0OwjPkONO"
}

consul {
  ssl = false
  address = "127.0.0.1:8500"
  token = "e9c27530-1155-0158-5906-9ff480721512"
}
Vault Config
# -dev mode

disable_mlock = true

api_addr = "http://127.0.0.1:8200"

storage "inmem" {
}
Vault Consul Secrets Backend
$ vault secrets enable consul
Success! Enabled the consul secrets engine at: consul/
$ vault write consul/roles/si-open policies=si-open-policy
Success! Data written to: consul/roles/si-open
$ vault write consul/config/access scheme=http address=127.0.0.1:8500 token=a8db31da-3224-7f30-0eba-b3a099806269 
Success! Data written to: consul/config/access
Nomad Job File
# v.nomad
job "v" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "host"
      port "uuidapi" {
	// let nomad pick
      }
    }

    service {
      name = "uuid-api"
      port = "${NOMAD_PORT_uuidapi}"
      task = "apitask"

      connect {
	native = true
      }
    }

    task "apitask" {
      driver = "docker"

      config {
	image = "shoenig/uuid-api:v0.0.5"
	network_mode = "host"
      }

      env {
	PORT = "${NOMAD_PORT_uuidapi}"
	BIND = "0.0.0.0"
      }

      vault {
	policies = ["si-policy"]
      }

      template {
	data = <<EOH
CONSUL_HTTP_TOKEN="{{with secret "consul/creds/si-open"}}{{.Data.token}}{{end}}"
EOH
	destination = "secrets/consul_token.env"
	env         = true
      }
    }
  }

  group "frontend" {
    network {
      mode = "host"

      port "http" {
	static = 9800
      }
    }

    service {
      name = "uuid-fe"
      port = "9800"
      task = "frontend"

      connect {
	native = true
      }
    }

    task "frontend" {
      driver = "docker"

      config {
	image = "shoenig/uuid-fe:v0.0.5"
	network_mode = "host"
      }

      env {
	UPSTREAM = "uuid-api"
	BIND     = "0.0.0.0"
	PORT     = "9800"
      }

      vault {
	policies = ["si-policy"]
      }

      template {
	data = <<EOH
CONSUL_HTTP_TOKEN="{{with secret "consul/creds/si-open"}}{{.Data.token}}{{end}}"
EOH
	destination = "secrets/consul_token.env"
	env         = true
      }
    }
  }
}
Launch
$ nomad job run v.nomad
==> Monitoring evaluation "9c393e5b"
    Evaluation triggered by job "v"
    Evaluation within deployment: "91a99e2c"
    Allocation "0ef82d75" created: node "890e1db8", group "frontend"
    Allocation "35b0e79c" created: node "890e1db8", group "api"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9c393e5b" finished with status "complete"

$ nomad job status v
ID            = v
Name          = v
Submit Date   = 2020-06-24T08:28:49-05:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
api         0       0         1        0       0         0
frontend    0       0         1        0       0         0

Latest Deployment
ID          = 91a99e2c
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
api         1        1       1        0          2020-06-24T08:38:59-05:00
frontend    1        1       1        0          2020-06-24T08:39:02-05:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
0ef82d75  890e1db8  frontend    0        run      running  3m26s ago  3m13s ago
35b0e79c  890e1db8  api         0        run      running  3m26s ago  3m15s ago
Verify Token is from our Vault Policy
$ nomad alloc exec 0ef8 /usr/bin/env | grep CONSUL_HTTP_TOKEN
CONSUL_HTTP_TOKEN=b33638e5-0443-f407-a66a-79a99248e1c5

$ CONSUL_HTTP_TOKEN=b33638e5-0443-f407-a66a-79a99248e1c5 consul acl token read -self
AccessorID:       e5629c0e-51ef-e19b-ee51-bea0cb4f9f5b
SecretID:         b33638e5-0443-f407-a66a-79a99248e1c5
Description:      Vault si-open token-0ef82d75-ea09-5a19-d4f1-11bcdb049af5-frontend 1593005329545929801
Local:            false
Create Time:      2020-06-24 08:28:49.546430511 -0500 CDT
Policies:
   2c68431f-dbb5-155e-820c-6c28b51515f3 - si-open-policy
Connect Intention Exists
$ envy exec ctls:consul2 consul intention check uuid-fe uuid-api
Allowed
Connect Working
$ curl localhost:9800

<html>
<head>
</head>
<body>
  <center>uuid: 283838da-7b82-cd25-b757-d21f31231d65</center>
</body>
</html>

@shoenig
Copy link
Member Author

shoenig commented Jun 24, 2020

Will follow up with issues for

  • improve connect native story with bridge networking
  • do not require naming the task for connect native if 1 task
  • inherit task name from service stanza into check stanza
  • ensure tls sharing works with raw_exec driver
  • e2e tests

@shoenig shoenig merged commit 41b94ee into master Jun 24, 2020
@shoenig shoenig deleted the f-cnative-host branch June 24, 2020 15:33
@github-actions
Copy link

github-actions bot commented Jan 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consul Connect Native support
3 participants