Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Consul ACL requirements and integration #11962

Closed
ygersie opened this issue Jan 31, 2022 · 5 comments
Closed

Document Consul ACL requirements and integration #11962

ygersie opened this issue Jan 31, 2022 · 5 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul theme/docs Documentation issues and enhancements type/bug

Comments

@ygersie
Copy link
Contributor

ygersie commented Jan 31, 2022

Nomad version

v1.2.3

Issue

The Consul token passed in either the job using consul_token, through the CLI using -consul-token=<token> or using the env var CONSUL_HTTP_TOKEN doesn't seem to be used by Nomad to register services and healthchecks.

Reproduction steps

I've setup a Nomad cluster with a Consul token that has the following policy associated:

$ consul acl policy read -name nomad
ID:           a5b7c7c6-215e-9a2b-a94a-8983f3880b38
Name:         nomad
Description:
Datacenters:
Rules:
agent_prefix   "agent-"                 { policy = "write" }
node_prefix    "agent-"                 { policy = "write" }
service        "nomad"                  { policy = "write" }
service        "nomad-server"           { policy = "write" }
service        "nomad-client"           { policy = "write" }
session_prefix ""                       { policy = "write" }

acl = "write"
operator = "write"
mesh = "write"

And passed to Nomad the following Consul config:

consul {
  allow_unauthenticated = false
  token                 = "21838960-6219-d97c-0340-7512e77419fe" # has above policy associated
}

I then setup a Consul ACL for my example service:

$ consul acl policy create -name "example-netcat" -rules - <<EOF
node_prefix    "agent-"                 { policy = "write" }
service        "example-netcat"         { policy = "write" }
EOF

And a token for my example service:

$ consul acl token create -description "Nomad service example-netcat" -policy-name example-netcat
AccessorID:       922b018f-3ff9-a68b-dfe7-3480b92e80ba
SecretID:         9be5dc8a-ac4b-4f47-92f1-fa188483f0dc
Description:      Nomad service example-netcat
Local:            false
Create Time:      2022-01-31 14:41:33.90314 +0100 CET
Policies:
   9e17fca3-31cd-0346-e6fd-e48419f68d25 - example-netcat

To verify this token works I've used the following consul service definition:

service {
  name = "example-netcat"
  port = 9999
  checks = [
    {
      tcp      = "localhost:9999"
      interval = "3s"
      timeout  = "1s"
    }
  ]
}

Now registering and later deregistering this service to test:

$ export CONSUL_HTTP_TOKEN=9be5dc8a-ac4b-4f47-92f1-fa188483f0dc

$ consul services register service.hcl
Registered service: example-netcat

$ curl -s -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" localhost:8500/v1/agent/services?pretty
{
    "example-netcat": {
        "ID": "example-netcat",
        "Service": "example-netcat",
        "Tags": [],
        "Meta": {},
        "Port": 9999,
        "Address": "",
        "Weights": {
            "Passing": 1,
            "Warning": 1
        },
        "EnableTagOverride": false,
        "Datacenter": "dev"
    }
}

$ curl -s -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" localhost:8500/v1/agent/checks?pretty
{
    "service:example-netcat": {
        "Node": "agent-1",
        "CheckID": "service:example-netcat",
        "Name": "Service 'example-netcat' check",
        "Status": "critical",
        "Notes": "",
        "Output": "dial tcp [::1]:9999: connect: connection refused",
        "ServiceID": "example-netcat",
        "ServiceName": "example-netcat",
        "ServiceTags": [],
        "Type": "tcp",
        "Interval": "",
        "Timeout": "",
        "ExposedPort": 0,
        "Definition": {},
        "CreateIndex": 0,
        "ModifyIndex": 0
    }
}

$ consul services deregister service.hcl
Deregistered service: example-netcat

So the newly created token works as expected. Now deploy the following job:

job "example" {
  region      = "dev"
  datacenters = ["dc1", "dc2", "dc3"]
  namespace   = "default"

  group "example" {
    network {
      mode = "host"
      port "nc" {
        static = 9999
        to     = 9999
      }
    }

    service {
      name = "example-netcat"
      port = "nc"

      check {
        type     = "tcp"
        interval = "3s"
        timeout  = "1s"
      }
    }

    task "example" {
      driver = "docker"
      config {
        image = "alpine"
        ports = ["nc"]
        args  = ["nc", "-lk", "-p", "${NOMAD_PORT_nc}", "-e", "cat"]
      }
    }
  }
}
$ nomad job run -consul-token=9be5dc8a-ac4b-4f47-92f1-fa188483f0dc example.hcl

I'd expect this would register the Consul service example-netcat with the provided token but I get the following in the Nomad log file:
2022-01-31T14:45:04.486+0100 [WARN] consul.sync: failed to update services in Consul: error="Unexpected response code: 403 (Permission denied: Missing service:write on example-netcat)"

So as you would expect based on the above log entry no services nor healthchecks are registered:

$ curl -s -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" localhost:8500/v1/agent/services?pretty
{}

$ curl -s -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" localhost:8500/v1/agent/checks?pretty
{}

To make it even more unexpected the job deployment succeeded anyway, while the default update stanza should make sure that health checks are passing for min_healthy_time. In this case the health check was never created so I'd expect the deployment to fail as well.

$ nomad deployment list
ID        Job ID   Job Version  Status      Description
de27aaf6  example  0            successful  Deployment completed successfully

TLDR

It seems that when the Nomad agent is configured to use a Consul token this token will always be used for registering services and healthchecks part of a Nomad job. The workaround is to create a Consul token with a policy that allows to register any service to be used by the Nomad agent, which is not a secure way of setting up a hashi cluster.

Am I misunderstanding how this should work or is this a bug?

@lgfa29
Copy link
Contributor

lgfa29 commented Feb 2, 2022

Hi @ygersie 👋

That's actually working as intended.

Nomad will use the Consul token set in the agent to register services. Then token passed using -consul-token is not actually stored and just sets the value for the job's consul_token field. From the docs:

The run command will set the consul_token of the job based on the following precedence, going from highest to lowest: the -consul-token flag, the $CONSUL_HTTP_TOKEN environment variable and finally the value in the job file.

If you need to isolate access to services in Consul you can Consul Namespaces, which is a Consul Enterprise feature.

We ave an internal ticket to track improvements to Consul registration, so I'm going to close this one, but feel free to ask any other question you may have 🙂

@ygersie
Copy link
Contributor Author

ygersie commented Feb 2, 2022

Hey @lgfa29 thanks for that clarification although I have to say the docs aren't very descriptive on the consul_token use. There's mentioned that this token is only used for Consul Connect enabled services, but how exactly? Even in Consul enterprise, how would one prevent a random job from registering a Consul service it shouldn't? I kind of assumed that passing the Consul token, which would have an acl policy that limits service registration to a specific service name, would provide that functionality. That's why I thought this was a bug.

@ygersie
Copy link
Contributor Author

ygersie commented Feb 2, 2022

Okay after some experimentation I understand better, the Consul token is used to verify if a user is allowed to create a new service. The docs could definitely use a bit more clarification. The Nomad agent Consul token needs acl = "read" capabilities to query for the Consul ACL policy associated with the token supplied by the user. Nomad then checks if the service name is allowed to be used with what is configured in the Consul policy. This way you can prevent users from supplying any random service name during job registration.

There is however a bug in there when using interpolation.

$ nomad job run example-service.hcl
Error submitting job: Unexpected response code: 500 (rpc error: job-submitter consul token denied: insufficient Consul ACL permissions to write service "${NOMAD_JOB_NAME}")

The ${NOMAD_JOB_NAME} variable as given in the service definition:

    service {
      name = "${NOMAD_JOB_NAME}"
      port = "nc"

      check {
        type     = "tcp"
        interval = "3s"
        timeout  = "1s"
      }
    }

isn't expanded correctly.

@lgfa29
Copy link
Contributor

lgfa29 commented Feb 3, 2022

Ah good catch! I believe this is the same issue described in #9741 which is also in our backlog. Would you mind giving it a 👍?

I will re-open this issue and rename it to focus on improving our docs around Consul ACL integration. The context you provided will be very valuable.

@lgfa29 lgfa29 reopened this Feb 3, 2022
@lgfa29 lgfa29 changed the title Job configured Consul token not used Document Consul ACL requirements and integration Feb 3, 2022
@lgfa29 lgfa29 added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/docs Documentation issues and enhancements and removed stage/not-a-bug labels Feb 3, 2022
@tgross
Copy link
Member

tgross commented Nov 1, 2023

Starting in Nomad 1.7.0-beta.1 we've deprecated the use of Consul tokens in the Nomad agent configuration for purposes of giving workload access to Consul KV. Nomad will use workload identities to sign into Consul for purposes of getting Consul tokens for those workloads.

As part of that work we've documented the new requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul theme/docs Documentation issues and enhancements type/bug
Projects
Development

No branches or pull requests

3 participants