Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad Connect doesn't manage TLS Consul endpoints #6594

Closed
vvanholl opened this issue Oct 30, 2019 · 9 comments · Fixed by #7602
Closed

Nomad Connect doesn't manage TLS Consul endpoints #6594

vvanholl opened this issue Oct 30, 2019 · 9 comments · Fixed by #7602
Assignees
Labels
theme/consul/connect Consul Connect integration type/bug
Milestone

Comments

@vvanholl
Copy link

Hi,

Some context :

I am using Nomad 0.10.0 and Consul 1.6.1. Both Nomad and Consul are working with TLS and ACLs enabled.

I try to make my Nomad jobs running with Connect but in the logs I always have these error messages:

2019-10-30T20:34:42.894Z [ERROR] client.alloc_runner.task_runner.task_hook.envoy_bootstrap: error creating bootstrap configuration for Connect proxy sidecar: alloc_id=4660d74d-c834-9219-e8ee-c0fbd6911732 task=connect-proxy-test error="exit status 1" stderr="==> Failed looking up sidecar proxy info for _nomad-task-4660d74d-c834-9219-e8ee-c0fbd6911732-group-test_group-test-1313: Unexpected response code: 400 (Client sent an HTTP request to an HTTPS server.
Then trying to understand more, I noticed Nomad runs this process without success
consul connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr endpoint.local.compuscene.net:8500 -bootstrap -sidecar-for _nomad-task-4660d74d-c834-9219-e8ee-c0fbd6911732-group-test_group-test-131

This doen't work too with exactly the same error message.

But if I put https:// before endpoint.local.compuscene.net:8500 this command works nice.

It seems Nomad doesn't take care about it's configuration, and in particular the ssl=true option :
"consul": { "address": "endpoint.local.compuscene.net:8500", "auto_advertise": true, "checks_use_advertise": true, "ssl": true, "token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" },
Moreover when I dig in the Nomad code, I see no reference to the Consul ssl option when creating Connect classes. Only the address is used.

I don't know if this is clear. If you have any question don't hesitate to ask me more if needed.

Vincent

@rkettelerij
Copy link
Contributor

rkettelerij commented Oct 31, 2019

What I coincidence. I was about to create a ticket for this since I'm also running into the same issue. Like @vvanholl says: Nomad currently assumes the local Consul agent is available over plain HTTP. Our configuration has TLS enabled on the Consul clients and Consul servers and we don't expose a plain HTTP endpoint on the Consul agent.

The problem is Nomad start the Consul Envoy proxy without any HTTP flags: https://github.com/hashicorp/nomad/blob/master/client/allocrunner/taskrunner/envoybootstrap_hook.go#L89

Therefore the Consul proxy fails to connect to the local Consul agent: https://github.com/hashicorp/consul/blob/cc9a6f79934a6da58b7aec63c057681d82aded5a/command/connect/proxy/proxy.go#L221

What Nomad should do is grab the Consul client configuration (the consul stanza in the Nomad config) and pass this (the TLS settings) along when starting the Consul proxy binary. The latter already accepts these settings.

@tgross tgross added the theme/consul/connect Consul Connect integration label Oct 31, 2019
@tgross
Copy link
Member

tgross commented Oct 31, 2019

Thanks for reporting this @vvanholl and @rkettelerij !

As of right now Consul ACL support is one of the known limitations of our implementation but is in the works. For TLS, I do see that we have an open issue for testing that properly (#6502) but this looks like a bug in how we look up the Consul address.

@tgross tgross added this to the near-term milestone Nov 7, 2019
@tgross tgross modified the milestones: near-term, unscheduled , 0.10.4 Jan 9, 2020
@schmichael schmichael modified the milestones: 0.10.4, 0.10.3 Jan 30, 2020
@shoenig shoenig self-assigned this Feb 20, 2020
@angrycub
Copy link
Contributor

angrycub commented Apr 1, 2020

There is a workaround in the short term that could be used. You can provide the necessary consul values as environment variables in your init script/systemd unit. I was able to work around this by adding the following values to the Nomad systemd unit on my nomad client.

Environment="CONSUL_HTTP_SSL=true"
Environment="CONSUL_CACERT=/path/to/cacert.pem"
Environment="CONSUL_CLIENT_CERT=/path/to/clientcert.pem"
Environment="CONSUL_CLIENT_KEY=/path/to/clientkey.pem"

replacing the paths above with paths to your actual certificates.

shoenig added a commit that referenced this issue Apr 2, 2020
Fixes #6594 #6711 #6714 #7567

e2e testing is still TBD in #6502

Before, we only passed the Nomad agent's configured Consul HTTP
address onto the `consul connect envoy ...` bootstrap command.
This meant any Consul setup with TLS enabled would not work with
Nomad's Connect integration.

This change now sets CLI args and Environment Variables for
configuring TLS options for communicating with Consul when doing
the envoy bootstrap, as described in
https://www.consul.io/docs/commands/connect/envoy.html#usage
@crizstian
Copy link

There is still an issue with Nomad consul connect jobs when Consul has TLS enabled

#7715

this are my environments vars

export DATACENTER=dc1

export VAULT_CACERT=/var/vault/config/ca.crt.pem
export VAULT_CLIENT_CERT=/var/vault/config/server.crt.pem
export VAULT_CLIENT_KEY=/var/vault/config/server.key.pem
export VAULT_ADDR=https://${HOST_IP}:8200

export NOMAD_ADDR=https://${HOST_IP}:4646
export NOMAD_CACERT=/var/vault/config/ca.crt.pem
export NOMAD_CLIENT_CERT=/var/vault/config/server.crt.pem
export NOMAD_CLIENT_KEY=/var/vault/config/server.key.pem

export CONSUL_SCHEME=https
export CONSUL_PORT=8500
export CONSUL_HTTP_ADDR=${CONSUL_SCHEME}://${HOST_IP}:${CONSUL_PORT}
export CONSUL_CACERT=/var/vault/config/ca.crt.pem
export CONSUL_CLIENT_CERT=/var/vault/config/server.crt.pem
export CONSUL_CLIENT_KEY=/var/vault/config/server.key.pem
export CONSUL_HTTP_SSL=true

@spuder
Copy link
Contributor

spuder commented May 11, 2020

I enabled TLS on consul and I am also seeing this problem. I've ensured that I have the following in /etc/sysconfig/nomad

Environment="CONSUL_HTTP_SSL=true"
Environment="CONSUL_CACERT=/path/to/cacert.pem"
Environment="CONSUL_CLIENT_CERT=/path/to/clientcert.pem"
Environment="CONSUL_CLIENT_KEY=/path/to/clientkey.pem"

I also have in my systemd unit file

[Service]
EnvironmentFile=-/etc/sysconfig/nomad

Nomad = 0.11.1
Consul = 1.7.2

@angrycub
Copy link
Contributor

@spuder, If you're talking about the deployment issue that Crizstian mentioned, I'd encourage you to head over to #7715 and chime in there. If you are experiencing something else, you might want to post a fresh issue.

An aside, as of Nomad 0.11 you do not need to provide the CONSUL SSL environment variables. That workaround is only necessary for Nomad 0.10.4

picatz added a commit to picatz/terraform-google-nomad that referenced this issue Jul 27, 2020
Should have Nomad and Consul deployed and configured with mTLS. ACLs are currently not enabled on Consul, only Nomad.

This should provide the minimal working example using mTLS to get the cought dashboard working after a ton of tinkering. 😭

The links I used during my investigation/debugging session:
* hashicorp/nomad#6463
* https://learn.hashicorp.com/nomad/consul-integration/nomad-connect-acl#run-a-connect-enabled-job
* hashicorp/nomad#6594
* hashicorp/nomad#4276
hashicorp/nomad#7715
* https://www.consul.io/docs/agent/options
⭐ * hashicorp/nomad#7602
@cgthayer
Copy link

cgthayer commented Apr 9, 2021

@shoenig Since this is closed, someone should update https://learn.hashicorp.com/tutorials/nomad/consul-service-mesh#create-the-job-specification

If you are using Nomad version 0.10 and your Consul cluster is TLS-enabled, you will need to provide additional Consul configurations as environment variables to the Nomad process. This is to work around a known issue in Nomad—#6594. Refer to the TLS-enabled Consul environment section in the "Advanced considerations" of this tutorial for details. You will be able to return to here after you read that material.

@suikast42
Copy link
Contributor

. I was able to work around this by adding the following values to the Nomad systemd unit on my nomad client.

What about auto_config ? The consul client certificates are dynamic there or am I worng?

@github-actions
Copy link

github-actions bot commented Jan 1, 2023

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/consul/connect Consul Connect integration type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants