Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad v1.2.4 template source path escapes alloc directory #11902

Closed
pgporada opened this issue Jan 22, 2022 · 14 comments · Fixed by #11930
Closed

Nomad v1.2.4 template source path escapes alloc directory #11902

pgporada opened this issue Jan 22, 2022 · 14 comments · Fixed by #11930
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul-template type/bug
Milestone

Comments

@pgporada
Copy link
Contributor

Nomad version

Nomad v1.2.4

Operating system and Environment details

Linux node99 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue

I specify several variables for files spread across my host filesystem. The variables are used for the source in a template block in a task. Each client is currently running with disable_file_sandbox = true in raw_exec mode.

Reproduction steps

Run a job with a variable in template source.

Expected Result

The job spec gets run.

Actual Result

The jobs fail to start with this error

 template: template source path escapes alloc directory 

I then downgraded servers and clients to Nomad v1.2.3 and did not encounter this error.

@axsuul
Copy link
Contributor

axsuul commented Jan 22, 2022

Also seeing this issue on 1.2.4. Downgrading to 1.2.3 no longer has this issue.

@jrasell
Copy link
Member

jrasell commented Jan 24, 2022

Hi @pgporada and @axsuul and thanks for reporting this. It does indeed seem like a regression due to the work undertaken within #11606. I'll raise this as a priority internally so we can look into raising a fix.

@jrasell jrasell added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul-template labels Jan 24, 2022
@DerekStrickland DerekStrickland self-assigned this Jan 24, 2022
@DerekStrickland
Copy link
Contributor

Hi @pgporada & @axsuul. Thanks for reporting/confirming this. I'm sorry that the new feature is causing you issues!

Would it be possible for you to share your jobspec and client configs with secrets redacted so that I can make sure I'm reproducing/testing exactly your configuration?

@pgporada
Copy link
Contributor Author

pgporada commented Jan 24, 2022

Here's my nomad client config.

client {
  enabled = true
  network_interface = "eth1"

  # TODO: Switch this to volume mounts with multi-node-read-only so that we don't
  #       have to disable the sandbox.
  template {
    # Allow nomad to access arbitrary files on disk, instead of just in the task working directory.
    disable_file_sandbox = true
  }
}

datacenter = "dev"
region = "dev"

data_dir  = "/var/nomad"
bind_addr = "10.4.13.82"

# Metrics will be exported at /v1/metrics?format=prometheus
telemetry {
  prometheus_metrics = true
}

# We can enable ACLs when we get it setup in consul
# https://learn.hashicorp.com/tutorials/nomad/consul-service-mesh
acl {
  enabled = false
  token_ttl = "30s"
  policy_ttl = "60s"
}


addresses {
  http = "10.4.13.82"
  rpc  = "10.4.13.82"
  serf = "10.4.13.82"
}

advertise {
  http = "10.4.13.82" # This may need to change?
  rpc  = "10.4.13.82"
  serf = "10.4.13.82"
}

ports {
    http = 4646
    rpc = 4647
    serf = 4648
}

plugin "raw_exec" {
  config {
    enabled = true
  }
}

disable_update_check = true
log_level = "INFO"
enable_syslog = true
leave_on_interrupt = false
leave_on_terminate = false

tls {
    http = true
    rpc = true

    ca_file = "/etc/nomad.d/tls/ca.crt"
    cert_file = "/etc/nomad.d/tls/nomad.cert"
    key_file = "/etc/nomad.d/tls/nomad.key"

    verify_server_hostname = true

    # There is a tradeoff if we set this to true.
    # https://github.com/hashicorp/nomad/issues/6923
    verify_https_client = false

    rpc_upgrade_mode = false

    tls_prefer_server_cipher_suites = true
    tls_min_version = "tls12"
    tls_cipher_suites = "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
}

consul {
    # The local consul client is running on loopback, so let's ensure we connect to the exact
    # same loopback address. This is a bit confusing at first because, "why can't we just
    # connect to 127.0.0.1?" Well dear admin, because we're using Ubuntu. Ubuntu, and probably
    # Debian by default will increment the 3rd octet of the loopback address by the number of
    # configured interfaces that exist on the server. In stg/prod we only have one interface
    # which would give us 127.0.1.1, but dev has two interfaces which gives us 127.0.2.1.
    # Additionally, Ubuntu will assign the short hostname and FQDN to that 127.0.x.1 address
    # to workaround this bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=316099
    address = "fqdn.of.devbox:8501"
    ssl = true
    ca_file = "/etc/nomad.d/tls/ca.crt"
    cert_file = "/etc/nomad.d/tls/consul.cert"
    key_file = "/etc/nomad.d/tls/consul.key"
}

# Enable CORS, retrieving logs is done via IP so we need CORS
http_api_response_headers {
	"Access-Control-Allow-Origin" = "*"
}

My job spec is from a private repository so I can't give that verbatim, but this is the gist of it. My syntax may be off here because I'm copying and pasting from a gigantic job spec.

variable "password-file" {
  type = string
}

locals {
  config-template = <<-EOF
    authtoken {{ file "/etc/app/password.txt" }}
  EOF
}

job "somejob" {
  datacenters = [var.datacenter]                                                                                                                                             
  type        = "service"
   template {
        source      = var.password-file
        destination = "${NOMAD_ALLOC_DIR}/data/password.txt"
        change_mode = "restart"
  }
   template {
        source      = local.config-template
        destination = "${NOMAD_ALLOC_DIR}/data/app.conf"
        change_mode = "restart"
  }
  task "server" {
      ...
      driver = "raw_exec"
      config {
        command = "/usr/local/bin/app"
        args = [
          "-config", "${NOMAD_ALLOC_DIR}/data/app.conf"
        ]
      }
  }
}

To run the job I do:

nomad job run \
	-address https://server.dev.nomad:4646 \
	-tls-server-name client.dev.nomad \
	-ca-cert /etc/nomad.d/tls/ca.crt \
	-client-cert /etc/nomad.d/tls/nomad.cert \
	-client-key /etc/nomad.d/tls/nomad.key \
	-var="datacenter=dev"  \
	-var="password-file=/etc/app/password.txt" \
	/etc/nomad.d/job_specs/cluster.hcl 

@lgfa29 lgfa29 added this to the 1.2.5 milestone Jan 24, 2022
@DerekStrickland
Copy link
Contributor

@pgporada I was able to replicate this on 1.2.4. Thanks for the extra configuration to help me get there faster!

@bubejur
Copy link

bubejur commented Jan 27, 2022

@DerekStrickland hi! Can you please tell me, we need to wait 1.2.5 for this fix?

@DerekStrickland
Copy link
Contributor

Hi @bubejur! Yes, I'm afraid you will either have to wait for 1.2.5 or build from source. My deepest apologies for an inconvenience this causes you.

@bubejur
Copy link

bubejur commented Jan 27, 2022

@DerekStrickland can you also provide me something like ETA for 1.2.5?

@DerekStrickland
Copy link
Contributor

I can't give you a specific date but we are actively working on getting a release out for this and a couple other patches soon.

@bubejur
Copy link

bubejur commented Jan 27, 2022

thanks a lot, will be waiting for it!

@bubejur
Copy link

bubejur commented Feb 2, 2022

@DerekStrickland hi! I was updated nomad to 1.2.5 yesterday, but this issue still exists:

Template failed: /data/nomad/alloc/57050f46-2928-12ed-aedf-e51fb2891834/worker-mpi-resolver/local/platformConfig/nl3.tmpl: execute: template: :1:36: executing "" at <plugin "/data/tools/consul.php">: error calling plugin: function is disabled

image

nomad servers:
image

@axsuul
Copy link
Contributor

axsuul commented Feb 2, 2022

@bubejur that looks to be a different error than the original issue

I updated to 1.2.5 and no longer have the error

template: template source path escapes alloc directory 

Thanks for the fix!

@DerekStrickland
Copy link
Contributor

Hi @axsuul

@bubejur had logged a separate issue that I closed as being a duplicate of this one. At the time, it did look like the same root cause. In that issue, he did cite this error message about plugins. Thanks for helping though! I initially had the same thought until I double checked his other issue. I've re-opened issue #11923 to track those efforts.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul-template type/bug
Projects
Development

Successfully merging a pull request may close this issue.

6 participants