Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOMAD_PORT_<label> and NOMAD_HOST_PORT_<label> have the same value #4880

Closed
kcwong-verseon opened this issue Nov 15, 2018 · 9 comments
Closed

Comments

@kcwong-verseon
Copy link
Contributor

Nomad version

Nomad v0.8.6 (fcc4149)

Operating system and Environment details

CentOS 7.5

Issue

For the inline template in this job spec:

job "test-vault" {
  region = "us"
  datacenters = [ "hq" ]
  type = "service"
  priority = 60

  group "service" {
    count = 2
    constraint {
      distinct_property = "${meta.rack}"
      value = "1"
    }
    task "vault" {
      driver = "docker"
      config {
	image = "vault:0.11.4"
	hostname = "${NOMAD_ALLOC_INDEX}.${NOMAD_JOB_NAME}"
	port_map {
	  vault_http = 8200
	  vault_int = 8201
	}
	cap_add = [ "IPC_LOCK" ]
	labels {
	  host = "${node.unique.name}"
	  class = "mgmt"
	}
	volumes = [
	  "local/config.hcl:/vault/config/config.hcl:ro",
	  "local/ssl:/vault/config/ssl:ro"
	]
	args = [
	  "vault",
	  "server",
	  "-config=/vault/config",
	  "-log-level=debug"
	]
      } # config
      artifact {
	source = "http://artifacts.example.com/${NOMAD_JOB_NAME}.crt"
	destination = "local/ssl/server.crt"
	mode = "file"
      }
      artifact {
	source = "http://artifacts.example.com/${NOMAD_JOB_NAME}.key"
	destination = "local/ssl/server.key"
	mode = "file"
      }
      service {
	name = "${NOMAD_JOB_NAME}-${NOMAD_ALLOC_INDEX}"
	port = "vault_http"
	address_mode = "driver"
	tags = [
	  "urlprefix-${NOMAD_JOB_NAME}.example.com:80/ redirect=301,https://${NOMAD_JOB_NAME}.example.com/ui",
	  "urlprefix-${NOMAD_JOB_NAME}.verseon.com:443/ proto=https tlsskipverify=true"
	]
	check {
	  type = "http"
	  protocol = "https"
	  address_mode = "driver"
	  method = "HEAD"
	  tls_skip_verify = true
	  port = "vault_http"
	  path = "/sys/health"
	  interval = "10s"
	  timeout = "2s"
	}
      }
      resources {
	cpu = 2000
	memory = 1000
	network {
	  port "vault_http" { }
	  port "vault_int" { }
	}
      }
      template {
	change_mode = "restart"
	data = <<EOF
cluster_name = "{{ env "NOMAD_JOB_NAME" }}.{{ env "node.datacenter" }}"
ui = true
default_lease_ttl = "12h"
storage "consul" {
  address = "{{ env "meta.pub_ip" }}:8500"
  scheme = "http"
  path = "test_vault/"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "/vault/config/ssl/server.crt"
  tls_key_file = "/vault/config/ssl/server.key"
  tls_disable = false
}
telemetry {
  statsite_address = "{{ env "meta.pub_ip" }}:8125"
  disable_hostname = true
}
api_addr = "https://{{ env "NOMAD_JOB_NAME" }}-{{ env "NOMAD_ALLOC_INDEX" }}.service.consul.verseon.{{ env "NOMAD_REGION" }}:8200"
cluster_addr = "https://{{ env "NOMAD_JOB_NAME" }}-{{ env "NOMAD_ALLOC_INDEX" }}.service.consul.verseon.{{ env "NOMAD_REGION" }}:8201"
# {{ env "NOMAD_PORT_vault_http" }}
# {{ env "NOMAD_PORT_vault_int" }}
# {{ env "NOMAD_HOST_PORT_vault_http" }}
# {{ env "NOMAD_HOST_PORT_vault_int" }}
EOF
	destination = "local/config.hcl"
      } # template
    } # task
  } # group
}

Reproduction steps

Run the above job and see the content of local/config.hcl. This is the output I got:

cluster_name = "test-vault.hq"
ui = true
default_lease_ttl = "12h"
storage "consul" {
  address = "172.20.1.46:8500"
  scheme = "http"
  path = "test_vault/"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "/vault/config/ssl/server.crt"
  tls_key_file = "/vault/config/ssl/server.key"
  tls_disable = false
}
telemetry {
  statsite_address = "172.20.1.46:8125"
  disable_hostname = true
}
api_addr = "https://test-vault-0.service.consul.verseon.us:8200"
cluster_addr = "https://test-vault-0.service.consul.verseon.us:8201"
# 30722
# 27764
# 30722
# 27764

I expect the last 4 lines to be:

# 8200
# 8201
# 30722
# 27764

This is kind of the opposite of #1391.

@preetapan
Copy link
Contributor

@kcwong-verseon Thanks for the report.
@notnoop helped reproduce this and looks like though inside the container we can see that the env var NOMAD_PORT_label is set correctly, when we execute the template it doesn't pick up the correct env var value. We'll investigate further and update, this could be a envconsul/consul-template issue

@kcwong-verseon
Copy link
Contributor Author

Sweet! I'm looking forward to the fix.

@ryanmickler
Copy link
Contributor

any updates on this? this certainly seems like a bug. i'm hitting it too

@kcwong-verseon
Copy link
Contributor Author

@preetapan @notnoop I'm aware you guys are working hard on 0.10.0, but do you know if this is resolved or not? I'm not ready to upgrade to 0.9.x quite yet (still on 0.8.7) so I have no idea if the issue is resolved in 0.9.x.

@jippi
Copy link
Contributor

jippi commented Sep 11, 2019

@preetapan @notnoop @endocrimes If this is indeed a bug in 0.9 I would like to put a big request for fixing it on 0.9.x release as well as your 0.10.x - it would entirely block nomad users adopting the 0.9.x release, and force a 3-6 month additional wait time for 0.10.x to become fully stable.

It's a hard blocker for SeatGeek adopting 0.9.x (something we had planned to do in Q4) and will force a risky "skip release" to 0.10.x when its feature complete and stable sometime in Q1

@notnoop
Copy link
Contributor

notnoop commented Sep 11, 2019

The 0.9 regression was fixed in #6251 , which we aim to release as part of 0.9.6. Post hashiconf, we aim to do some thorough testing and cut a release.

We haven't be able to reproduce it in 0.8.7 yet, the version this ticket is against.

@jippi
Copy link
Contributor

jippi commented Sep 12, 2019

Thank you @notnoop ! :)

Can also verify 0.8.x is not affected in our environment

@tgross tgross added stage/needs-verification Issue needs verifying it still exists theme/networking labels Mar 4, 2021
@tgross
Copy link
Member

tgross commented Jun 28, 2021

Looks like the regression was closed out in 0.9 and the whole networking configuration has been reworked since this was opened. Going to close this issue out.

@tgross tgross closed this as completed Jun 28, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants