Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad Jobs unable to update Consul Services after undetermined amount of time. #20159

Closed
JoaoPPinto opened this issue Mar 19, 2024 · 19 comments · Fixed by #24166
Closed

Nomad Jobs unable to update Consul Services after undetermined amount of time. #20159

JoaoPPinto opened this issue Mar 19, 2024 · 19 comments · Fixed by #24166

Comments

@JoaoPPinto
Copy link

Hello,
We recently upgraded one of our Nomad clusters to 1.7.5 and enabled Workload Identities for both Consul and Vault.
We noticed a behavior where nomad services apparently cannot update/register services in consul after an undetermined amount of time (from our tests it could range from minutes to hours).
Restarting the Nomad Client Agents appears to mitigate the issue temporarily.
We managed to reproduce the problem in a newly created Nomad/Consul/Vault, with 1 Server agent and 1 Client Agent for each.
The workload identity configuration was done with the help of nomad setup consul and the default values were used for policies/binding-rules/roles.
I have since changed the Nomad log level to DEBUG, but haven't managed to reproduce the issue again after restarting the agents.

Nomad version

Nomad v1.7.5
BuildDate 2024-02-13T15:10:13Z
Revision 5f5d4646198d09b8f4f6cb90fb5d50b53fa328b8

Operating system and Environment details

Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

Issue

After an undetermined amount of time, Nomad jobs no longer can correctly update Services in Consul. (eg. if a new allocation of a service replaces an existing, no health checks are created for the new allocation, and the health checks for old allocation remain in place)

Reproduction steps

Stand up new Nomad cluster with Consul Workload Integration, Deploy test job, and trigger a new allocation placement after initial service registration.

Expected Result

Services are properly updated

Actual Result

Services are not updated at all. New services are not registered, Old services are not deregistered, existing services are not updated.

Job file (if appropriate)

job "test-job-1" {
  datacenters = ["dc1"]
  type        = "service"

  namespace = "default"

  group "test-group-1" {
    count = 1

    network {
      port "traffic-port" {
        to = 80
      }
    }

    service {
      name = "test-service-1"
      port = "traffic-port"
      check {
        name     = "status"
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "test-task-1" {
      driver = "docker"

      config {
        image = "php:8.3.0-apache"
        volumes = [
          "local/index.php:/var/www/html/index.php"
        ]
        ports = ["traffic-port"]
      }

      template {
        destination = "local/index.php"
        change_mode = "restart"
        data        = <<EOF
<?php
print "{{ key "var_consul" }}";
?>
EOF
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

2024-03-18T14:39:45.564Z [ERROR] consul.sync: still unable to update services in Consul: failures=60 error="Unexpected response code: 403 (ACL not found)"

This is the only error in the log. I have since changed the log level to DEBUG in both Server and Client agents in an attempt to get more logs, but haven't been able to reproduce the problem since restarted the agents.

@JoaoPPinto
Copy link
Author

To add some more information, we are seeing these logs on the consul clients when the issue is happening.

2024-03-19T13:22:53.775Z [WARN]  agent: Service registration blocked by ACLs: service=_nomad-task-<uuid>-group-<task> accessorID=""
2024-03-19T13:23:13.700Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=w.x.y.z:8300 error="rpc error making call: token does not exist: ACL not found"
2024-03-19T13:23:13.700Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-<uuid>-group-<task> from=127.0.0.1:52162 error="ACL not found"

These logs where taken from one of the clusters where we are experiencing the issue. I've not been able to reproduce the issue since restarting the nomad server and client yet.

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 20, 2024

Hi @JoaoPPinto 👋

I was kind of able to reproduce this issue by manually deleting the Consul ACL token created for the Nomad service.

By any chance, do you have some kind of background process that may have deleted the token? Would you be able to check the Consul logs for messages such as this?

2024-03-20T18:48:24.485-0400 [DEBUG] agent.http: Request finished: method=DELETE url=/v1/acl/token/78122603-34cb-c1c4-54df-594dec981708?dc=dc1 from=127.0.0.1:52351 latency=1.306708ms

And, if possible, would you be able to share more lines around the error log line? Having the full log could be useful as well. You can email them to [email protected] with the GitHub issue ID in the subject to avoid leaking sensitive data.

Thanks!

@JoaoPPinto
Copy link
Author

Thank you for your reply @lgfa29 !

I can confirm that manually deleting the service token in consul reproduces the problem in the test environment I created.
However, I don't believe we have anything deleting consul tokens in the cluster we originally detected the issue. In fact, while debugging the issue originally, I noticed the cluster has thousands of Consul tokens, if not more.
I'll change the Consul log level to DEBUG and check if I can find any such log, however I do notice I have a few errors failing to delete tokens

2024-03-20T16:40:15.714Z [ERROR] agent.http: Request error: method=DELETE url=/v1/acl/token/aa3019b0-ebaf-4cd7-5d9d-1760b5308266 from=127.0.0.1:55442 error="Cannot find token to delete"
2024-03-20T16:40:15.866Z [ERROR] agent.http: Request error: method=DELETE url=/v1/acl/token/d7a95319-3394-635e-23a6-1d5d1d13b338 from=127.0.0.1:46324 error="Cannot find token to delete"

We also noticed after opening the issue that in that cluster, when the problem is manifesting itself Nomad clients themselves are not able to register the Nomad service in Consul.

2024-03-20T11:38:00.286Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=<address>:8300 error="rpc error making call: ACL not found"
2024-03-20T11:38:00.289Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=<address>:8300 error="rpc error making call: token does not exist: ACL not found"
2024-03-20T11:38:00.289Z [WARN]  agent: Service registration blocked by ACLs: service=_nomad-client-5ihtip6u7lak5gq3ya4z7geymvbdxbls accessorID=""
2024-03-20T11:39:41.998Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=<address>:8300 error="rpc error making call: ACL not found"
2024-03-20T11:39:42.000Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=<address>:8300 error="rpc error making call: token does not exist: ACL not found"
2024-03-20T11:39:42.001Z [WARN]  agent: Service registration blocked by ACLs: service=_nomad-client-5ihtip6u7lak5gq3ya4z7geymvbdxbls accessorID=""

The token the agent's are using does exist and has a pretty permissive policy in this environment, including write permission in agent, node and service fully.
We replaced the token in use with the bootstrap token and the problem hasn't happened in the last ~24h.

What doesn't make sense to me is that I'm using the exact same policy (without bootstrap token) in the separate test cluster and haven't seen that issue yet.

In the meantime, I'll try to collect additional logs.

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 21, 2024

In fact, while debugging the issue originally, I noticed the cluster has thousands of Consul tokens, if not more.

Thanks for the extra info. I suspect this may be related to #20184, which I noticed while investigating this issue.

Do you know if clients have been restarted?

We also noticed after opening the issue that in that cluster, when the problem is manifesting itself Nomad clients themselves are not able to register the Nomad service in Consul.
The token the agent's are using does exist and has a pretty permissive policy in this environment, including write permission in agent, node and service fully.
We replaced the token in use with the bootstrap token and the problem hasn't happened in the last ~24h.

With the new workload identity workflow, the token used to call most of Consul API endpoints are the ones derived for each task, not the agent token. The integration docs has a diagram that kind of illustrates this, so I think the Consul given to the Nomad shouldn't affect this 🤔

image

My biggest hunch so far is that there seems to be a few places in the Consul integration that relies on ephemeral storage, which may be lost if a Nomad agent restarts or crashes.

With the information we have so far I find the accessorID="" part quite suspicious here:

2024-03-19T13:22:53.775Z [WARN]  agent: Service registration blocked by ACLs: service=_nomad-task-<uuid>-group-<task> accessorID=""

Looking through Consul's code base, we find that this message is supposed to have the accessor ID of the secret ID used in the API call.
https://github.com/hashicorp/consul/blob/fea6926de39ed0615d73b7671288fe1af11493db/agent/local/state.go#L1500-L1503

But the aclAccessorID method returns an empty string if the token is not found, which is what we're seeing in the log.
https://github.com/hashicorp/consul/blob/fea6926de39ed0615d73b7671288fe1af11493db/agent/local/state.go#L1626-L1639

From the other logs you've shared, I think Nomad is trying to register a service that is missing from the Consul catalog.

token := c.getServiceToken(id)
if err = c.agentAPI.ServiceRegisterOpts(
serviceInNomad, api.ServiceRegisterOpts{
ReplaceExistingChecks: exists,
Token: token,
}); err != nil {
metrics.IncrCounter([]string{"client", "consul", "sync_failure"}, 1)
return err
}

We can see that the token used is fetched from an local map.

// getServiceToken returns the Consul token for a specific service ID
func (c *ServiceClient) getServiceToken(serviceID string) string {
c.serviceTokensLock.RLock()
defer c.serviceTokensLock.RUnlock()
return c.serviceTokens[serviceID]
}

And this map is updated every time we register a new workload or update it.

// Save the workloads tokens in the token store; must be done before we
// commit or start watches
c.setServiceTokens(tokens)

// Save the workloads tokens in the token store; must be done before we
// commit or start watches
c.setServiceTokens(tokens)

This points to either of two scenarios:

  1. Nomad has a Consul ACL token that doesn't exist in its internal state, and it's using that for the request.
  2. The token that Nomad was supposed to be using disappeared from its internal state, so it's passing an empty value to the Consul API.

So it seems like there's some data consistency issue inside Nomad, but I have not been able to pinpoint exactly where, and it may take us a while to find it....I have placed this into our board so we can properly roadmap it and find some time for a more in-depth investigation.

It's not ideal, but if this is something that is happening frequently you may want to keep an eye on the metric nomad.client.consul.sync_failure, which I think will increase whenever an error happens in the sync loop.

And if you have more logs that could be helpful as well, the more the better, so if you could look for the allocation ID, or the service name so we can understand their entire lifecycle that could help us.

@lgfa29 lgfa29 removed their assignment Mar 21, 2024
@JoaoPPinto
Copy link
Author

Thanks again for the reply.

| Do you know if clients have been restarted?

Yes. We noticed that restarting the Nomad Client would result in the services being able to be registered, so our workaround for the moment is restarting the Nomad Clients when we detect that problem has manifested itself.

| nomad.client.consul.sync_failure

I have checked our metrics and it seems we started seeing an increase when we changed the cluster configuration to workload identity.

image

I believe the problem has happened in the test cluster I stood up, I'll collect all the information I can and submit it hopefully today still.

@JoaoPPinto
Copy link
Author

I'll send the full log files via the nomad-oss-debug address as they are quite large, but I'll post what I believe are the most relevant bits here.

The error was noticed on 21 Mar at ~09:18, when creating a new allocation for a job already executing (by stopping the running allocation via the UI)

Nomad Version

Nomad v1.7.5
BuildDate 2024-02-13T15:10:13Z
Revision 5f5d4646198d09b8f4f6cb90fb5d50b53fa328b8

Operating System

Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

Neither Client nor Server were restarted in the couple days prior to the error.

pm@devops-temp01:~$ sudo systemctl status nomad
● nomad.service - nomad agent
     Loaded: loaded (/etc/systemd/system/nomad.service; disabled; preset: enabled)
     Active: active (running) since Mon 2024-03-18 15:14:32 WET; 3 days ago
       Docs: https://nomadproject.io/docs/
   Main PID: 6561 (nomad)
      Tasks: 9
     Memory: 48.5M
        CPU: 23min 38.282s
     CGroup: /system.slice/nomad.service
             └─6561 /home/pm/nomad/bin/nomad agent -config /home/pm/nomad/nomad.d
pm@devops-temp02:~$ sudo systemctl status nomad
● nomad.service - nomad agent
     Loaded: loaded (/etc/systemd/system/nomad.service; disabled; preset: enabled)
     Active: active (running) since Mon 2024-03-18 14:59:34 WET; 3 days ago
       Docs: https://nomadproject.io/docs/
   Main PID: 3723 (nomad)
      Tasks: 46
     Memory: 122.6M
        CPU: 41min 15.467s
     CGroup: /system.slice/nomad.service
             ├─  3723 /home/pm/nomad/bin/nomad agent -config /home/pm/nomad/nomad.d
             ├─ 74157 /home/pm/nomad/bin/nomad logmon
             ├─ 74280 /home/pm/nomad/bin/nomad docker_logger
             ├─148537 /home/pm/nomad/bin/nomad logmon
             └─148647 /home/pm/nomad/bin/nomad docker_logger

Nomad Server Logs

I took the liberty to remove some debug http request logs to improve readibility

Mar 21 09:18:03 devops-temp01 nomad[6561]:     2024-03-21T09:18:03.415Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:60530
Mar 21 09:18:13 devops-temp01 nomad[6561]:     2024-03-21T09:18:13.416Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:57496
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.037Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff type=service namespace=default job_id=test-job-2 node_id="" triggered_by=job-deregister
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.037Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21
Mar 21 09:18:23 devops-temp01 nomad[6561]:   results=
Mar 21 09:18:23 devops-temp01 nomad[6561]:   | Total changes: (place 0) (destructive 0) (inplace 0) (stop 1) (disconnect 0) (reconnect 0)
Mar 21 09:18:23 devops-temp01 nomad[6561]:   | Desired Changes for "test-group-2": (place 0) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)
Mar 21 09:18:23 devops-temp01 nomad[6561]:   
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.044Z [DEBUG] worker: submitted plan for evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.044Z [DEBUG] worker.service_sched: setting eval status: eval_id=b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 status=complete
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.048Z [DEBUG] worker: updated evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval="<Eval \"b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff\" JobID: \"test-job-2\" Namespace: \"default\">"
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.048Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=b1a6f1bb-6727-11f5-c9d9-a9a49ed334ff type=service namespace=default job_id=test-job-2 node_id="" triggered_by=job-deregister
Mar 21 09:18:23 devops-temp01 nomad[6561]:     2024-03-21T09:18:23.418Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:45884
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.748Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=c4c39059-7525-fc9c-a155-abdaf0be04ea type=service namespace=default job_id=test-job-2 node_id="" triggered_by=job-register
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.748Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=c4c39059-7525-fc9c-a155-abdaf0be04ea job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21
Mar 21 09:18:25 devops-temp01 nomad[6561]:   results=
Mar 21 09:18:25 devops-temp01 nomad[6561]:   | Total changes: (place 1) (destructive 0) (inplace 0) (stop 0) (disconnect 0) (reconnect 0)
Mar 21 09:18:25 devops-temp01 nomad[6561]:   | Created Deployment: "283b0fb7-78d8-53ca-74ed-cafe2fe1c1cc"
Mar 21 09:18:25 devops-temp01 nomad[6561]:   | Desired Changes for "test-group-2": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
Mar 21 09:18:25 devops-temp01 nomad[6561]:   
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.756Z [DEBUG] worker: submitted plan for evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=c4c39059-7525-fc9c-a155-abdaf0be04ea
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.756Z [DEBUG] worker.service_sched: setting eval status: eval_id=c4c39059-7525-fc9c-a155-abdaf0be04ea job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 status=complete
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.759Z [DEBUG] worker: updated evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval="<Eval \"c4c39059-7525-fc9c-a155-abdaf0be04ea\" JobID: \"test-job-2\" Namespace: \"default\">"
Mar 21 09:18:25 devops-temp01 nomad[6561]:     2024-03-21T09:18:25.759Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=c4c39059-7525-fc9c-a155-abdaf0be04ea type=service namespace=default job_id=test-job-2 node_id="" triggered_by=job-register
Mar 21 09:18:33 devops-temp01 nomad[6561]:     2024-03-21T09:18:33.418Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:57182
Mar 21 09:18:43 devops-temp01 nomad[6561]:     2024-03-21T09:18:43.420Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:33410
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.982Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=14151ddf-62a7-d515-b13e-a2d49a6992a5 type=service namespace=default job_id=test-job-2 node_id="" triggered_by=deployment-watcher
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.983Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=14151ddf-62a7-d515-b13e-a2d49a6992a5 job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21
Mar 21 09:18:44 devops-temp01 nomad[6561]:   results=
Mar 21 09:18:44 devops-temp01 nomad[6561]:   | Total changes: (place 0) (destructive 0) (inplace 0) (stop 0) (disconnect 0) (reconnect 0)
Mar 21 09:18:44 devops-temp01 nomad[6561]:   | Deployment Update for ID "283b0fb7-78d8-53ca-74ed-cafe2fe1c1cc": Status "successful"; Description "Deployment completed successfully"
Mar 21 09:18:44 devops-temp01 nomad[6561]:   | Desired Changes for "test-group-2": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 1) (canary 0)
Mar 21 09:18:44 devops-temp01 nomad[6561]:   
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.986Z [DEBUG] worker: submitted plan for evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=14151ddf-62a7-d515-b13e-a2d49a6992a5
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.986Z [DEBUG] worker.service_sched: setting eval status: eval_id=14151ddf-62a7-d515-b13e-a2d49a6992a5 job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 status=complete
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.988Z [DEBUG] worker: updated evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval="<Eval \"14151ddf-62a7-d515-b13e-a2d49a6992a5\" JobID: \"test-job-2\" Namespace: \"default\">"
Mar 21 09:18:44 devops-temp01 nomad[6561]:     2024-03-21T09:18:44.988Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=14151ddf-62a7-d515-b13e-a2d49a6992a5 type=service namespace=default job_id=test-job-2 node_id="" triggered_by=deployment-watcher
Mar 21 09:18:53 devops-temp01 nomad[6561]:     2024-03-21T09:18:53.421Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:46004
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.033Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=2215b0b0-b0f9-a455-c42a-45fd31fb033d type=service namespace=default job_id=test-job-2 node_id="" triggered_by=alloc-stop
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.033Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=2215b0b0-b0f9-a455-c42a-45fd31fb033d job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21
Mar 21 09:19:03 devops-temp01 nomad[6561]:   results=
Mar 21 09:19:03 devops-temp01 nomad[6561]:   | Total changes: (place 1) (destructive 0) (inplace 0) (stop 1) (disconnect 0) (reconnect 0)
Mar 21 09:19:03 devops-temp01 nomad[6561]:   | Desired Changes for "test-group-2": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 1) (ignore 0) (canary 0)
Mar 21 09:19:03 devops-temp01 nomad[6561]:   
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.046Z [DEBUG] worker: submitted plan for evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=2215b0b0-b0f9-a455-c42a-45fd31fb033d
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.046Z [DEBUG] worker.service_sched: setting eval status: eval_id=2215b0b0-b0f9-a455-c42a-45fd31fb033d job_id=test-job-2 namespace=default worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 status=complete
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.049Z [DEBUG] worker: updated evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval="<Eval \"2215b0b0-b0f9-a455-c42a-45fd31fb033d\" JobID: \"test-job-2\" Namespace: \"default\">"
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.049Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=2215b0b0-b0f9-a455-c42a-45fd31fb033d type=service namespace=default job_id=test-job-2 node_id="" triggered_by=alloc-stop
Mar 21 09:19:03 devops-temp01 nomad[6561]:     2024-03-21T09:19:03.422Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:34968
Mar 21 09:19:13 devops-temp01 nomad[6561]:     2024-03-21T09:19:13.423Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:35370
Mar 21 09:19:23 devops-temp01 nomad[6561]:     2024-03-21T09:19:23.424Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:51740
Mar 21 09:19:33 devops-temp01 nomad[6561]:     2024-03-21T09:19:33.425Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:47934
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=8d078571-69d2-2d62-5d9f-a02f39f7fbca type=_core namespace=- job_id=eval-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] core.sched: eval GC scanning before cutoff index: index=4891 eval_gc_threshold=1h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] core.sched: eval GC scanning before cutoff index: index=3349 batch_eval_gc_threshold=24h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=8d078571-69d2-2d62-5d9f-a02f39f7fbca type=_core namespace=- job_id=eval-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=4e665053-e951-3413-b14f-26a0ae5c5dfe type=_core namespace=- job_id=csi-plugin-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.348Z [DEBUG] core.sched: CSI plugin GC scanning before cutoff index: index=4891 csi_plugin_gc_threshold=1h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=4e665053-e951-3413-b14f-26a0ae5c5dfe type=_core namespace=- job_id=csi-plugin-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=405da6c8-76bd-75d3-7d48-659660395b27 type=_core namespace=- job_id=csi-volume-claim-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] core.sched: CSI volume claim GC scanning before cutoff index: index=4946 csi_volume_claim_gc_threshold=5m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=405da6c8-76bd-75d3-7d48-659660395b27 type=_core namespace=- job_id=csi-volume-claim-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=c10a0d82-3a29-d00c-04e7-09ca2151b970 type=_core namespace=- job_id=node-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] core.sched: node GC scanning before cutoff index: index=3349 node_gc_threshold=24h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=c10a0d82-3a29-d00c-04e7-09ca2151b970 type=_core namespace=- job_id=node-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=a012ccbd-33a4-8d6f-a0c5-ec9ff9762103 type=_core namespace=- job_id=deployment-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] core.sched: deployment GC scanning before cutoff index: index=0 deployment_gc_threshold=120h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.349Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=a012ccbd-33a4-8d6f-a0c5-ec9ff9762103 type=_core namespace=- job_id=deployment-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.819Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=e88ce352-8a4f-e6ab-59d9-b12539037ff5 type=_core namespace=- job_id=local-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.819Z [DEBUG] core.sched: local expired ACL tokens GC scanning before cutoff index: index=4891 acl_token_expiration_gc_threshold=1h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.820Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=e88ce352-8a4f-e6ab-59d9-b12539037ff5 type=_core namespace=- job_id=local-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.821Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=0628fc14-f952-ab5f-8a88-56c6de063174 type=_core namespace=- job_id=global-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.821Z [DEBUG] core.sched: global expired ACL tokens GC scanning before cutoff index: index=4891 acl_token_expiration_gc_threshold=1h0m0s
Mar 21 09:19:34 devops-temp01 nomad[6561]:     2024-03-21T09:19:34.821Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=0628fc14-f952-ab5f-8a88-56c6de063174 type=_core namespace=- job_id=global-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:19:43 devops-temp01 nomad[6561]:     2024-03-21T09:19:43.426Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:41452
Mar 21 09:19:53 devops-temp01 nomad[6561]:     2024-03-21T09:19:53.426Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:39154
Mar 21 09:20:03 devops-temp01 nomad[6561]:     2024-03-21T09:20:03.428Z [DEBUG] nomad: memberlist: Stream
...
Mar 21 09:24:23 devops-temp01 nomad[6561]:     2024-03-21T09:24:23.453Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:39610
Mar 21 09:24:33 devops-temp01 nomad[6561]:     2024-03-21T09:24:33.454Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:41682
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=b9c84562-dc36-5fc9-de6c-c44f77df6125 type=_core namespace=- job_id=eval-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] core.sched: eval GC scanning before cutoff index: index=4891 eval_gc_threshold=1h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] core.sched: eval GC scanning before cutoff index: index=3349 batch_eval_gc_threshold=24h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=b9c84562-dc36-5fc9-de6c-c44f77df6125 type=_core namespace=- job_id=eval-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=8e597abc-e610-00e4-2dc5-b09b1e640ae7 type=_core namespace=- job_id=deployment-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] core.sched: deployment GC scanning before cutoff index: index=0 deployment_gc_threshold=120h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=8e597abc-e610-00e4-2dc5-b09b1e640ae7 type=_core namespace=- job_id=deployment-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=17d5517e-37aa-27be-fa08-4bb0b798aede type=_core namespace=- job_id=csi-volume-claim-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] core.sched: CSI volume claim GC scanning before cutoff index: index=4957 csi_volume_claim_gc_threshold=5m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.348Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=17d5517e-37aa-27be-fa08-4bb0b798aede type=_core namespace=- job_id=csi-volume-claim-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.349Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=d9719eb7-72b7-32b2-fb3b-e4e970ce08e0 type=_core namespace=- job_id=one-time-token-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.351Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=d9719eb7-72b7-32b2-fb3b-e4e970ce08e0 type=_core namespace=- job_id=one-time-token-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.351Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=7c54f709-52ca-d59f-8375-7d2289cd2cfd type=_core namespace=- job_id=variables-rekey node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.351Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=7c54f709-52ca-d59f-8375-7d2289cd2cfd type=_core namespace=- job_id=variables-rekey node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=a93524e7-899d-2232-67ec-f1a1686cca98 type=_core namespace=- job_id=csi-plugin-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] core.sched: CSI plugin GC scanning before cutoff index: index=4891 csi_plugin_gc_threshold=1h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=a93524e7-899d-2232-67ec-f1a1686cca98 type=_core namespace=- job_id=csi-plugin-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=d368dd30-8858-ad65-b0f9-1525f868768d type=_core namespace=- job_id=root-key-rotate-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] core.sched: root key GC scanning before cutoff index: index=0 root_key_rotation_threshold=720h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] core.sched: root key GC scanning before cutoff index: index=4891 root_key_gc_threshold=1h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=d368dd30-8858-ad65-b0f9-1525f868768d type=_core namespace=- job_id=root-key-rotate-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=7cb53139-2d56-3f66-8e37-897a5cf3a29c type=_core namespace=- job_id=node-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] core.sched: node GC scanning before cutoff index: index=3360 node_gc_threshold=24h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.352Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=7cb53139-2d56-3f66-8e37-897a5cf3a29c type=_core namespace=- job_id=node-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.820Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=4af65668-ac49-396a-a119-55581cbf0828 type=_core namespace=- job_id=local-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.820Z [DEBUG] core.sched: local expired ACL tokens GC scanning before cutoff index: index=4902 acl_token_expiration_gc_threshold=1h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.820Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=4af65668-ac49-396a-a119-55581cbf0828 type=_core namespace=- job_id=local-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.822Z [DEBUG] worker: dequeued evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=3a81ce1a-2bff-3762-dddb-01626fc00425 type=_core namespace=- job_id=global-token-expired-gc node_id="" triggered_by=scheduled
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.822Z [DEBUG] core.sched: global expired ACL tokens GC scanning before cutoff index: index=4902 acl_token_expiration_gc_threshold=1h0m0s
Mar 21 09:24:34 devops-temp01 nomad[6561]:     2024-03-21T09:24:34.822Z [DEBUG] worker: ack evaluation: worker_id=e13da1d0-7ee1-8a82-d2d7-8c03598d2f21 eval_id=3a81ce1a-2bff-3762-dddb-01626fc00425 type=_core namespace=- job_id=global-token-expired-gc node_id="" triggered_by=scheduled

Nomad Client Log

Again, I removed some http request logs to improve readability

Mar 21 09:18:18 devops-temp02 nomad[3723]:     2024-03-21T09:18:18.315Z [DEBUG] client: received stale allocation information; retrying: index=3518 min_index=3597
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.047Z [DEBUG] client: updated allocations: index=4963 total=2 pulled=1 filtered=1
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.048Z [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=1
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.051Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=1 errors=0
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.052Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.059Z [DEBUG] consul.sync: sync complete: registered_services=0 deregistered_services=1 registered_checks=0 deregistered_checks=0
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.706Z [DEBUG] client: updated allocations: index=4965 total=2 pulled=0 filtered=2
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.706Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=2
Mar 21 09:18:23 devops-temp02 nomad[3723]:     2024-03-21T09:18:23.707Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=2 errors=0
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.340Z [INFO]  client.driver_mgr.docker: stopped container: container_id=766926d809e2b6dd6006e424a844fd2a848db5393ca594da9f488cad5dfd4af0 driver=docker
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.349Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 type=Terminated msg="Exit Code: 0" failed=false
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.351Z [DEBUG] client.driver_mgr.docker.docker_logger.stdio: received EOF, stopping recv loop: driver=docker err="rpc error: code = Unavailable desc = error reading from server: EOF"
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.356Z [INFO]  client.driver_mgr.docker.docker_logger: plugin process exited: driver=docker plugin=/home/pm/nomad/bin/nomad id=74069
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.356Z [DEBUG] client.driver_mgr.docker.docker_logger: plugin exited: driver=docker
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.365Z [DEBUG] client.driver_mgr.docker: image id reference count decremented: driver=docker image_id=sha256:3bbddfeb929989067da15efacb0913e66756fc755a3e30c664ff64571590a4ee references=1
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.365Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 type=Killed msg="Task successfully killed" failed=false
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.372Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.stdio: received EOF, stopping recv loop: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 err="rpc error: code = Unavailable desc = error reading from server: EOF"
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.377Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 plugin=/home/pm/nomad/bin/nomad id=73955
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.377Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.378Z [INFO]  agent: (runner) stopping
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.378Z [DEBUG] agent: (runner) stopping watcher
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.378Z [DEBUG] agent: (watcher) stopping all views
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.378Z [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.380Z [INFO]  client.gc: marking allocation for GC: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.378Z [INFO]  agent: (runner) received finish
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.567Z [DEBUG] client: updated allocations: index=4966 total=2 pulled=0 filtered=2
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.568Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=2
Mar 21 09:18:24 devops-temp02 nomad[3723]:     2024-03-21T09:18:24.568Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=2 errors=0
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.760Z [DEBUG] client: updated allocations: index=4968 total=3 pulled=1 filtered=2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.760Z [DEBUG] client: allocation updates: added=1 removed=0 updated=0 ignored=2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.764Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Received msg="Task received by client" failed=false
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.764Z [DEBUG] client: allocation updates applied: added=1 removed=0 updated=0 ignored=2 errors=0
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.764Z [DEBUG] client.alloc_runner.widmgr: retrieving and renewing workload identities: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 num_identities=3
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.775Z [DEBUG] client.alloc_runner.widmgr: waiting to renew identities: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 num=3 wait=32m26.514421166s
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.786Z [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.788Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type="Task Setup" msg="Building Task Directory" failed=false
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.798Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 path=/home/pm/nomad/bin/nomad args=["/home/pm/nomad/bin/nomad", "logmon"]
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.799Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 path=/home/pm/nomad/bin/nomad pid=107630
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.799Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 plugin=/home/pm/nomad/bin/nomad
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.801Z [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=0 deregistered_checks=0
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.843Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 version=2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.843Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 address=/tmp/plugin2295051249 network=unix @module=logmon timestamp=2024-03-21T09:18:25.843Z
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.846Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 @module=logmon path=/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/alloc/logs/.test-task-2.stdout.fifo timestamp=2024-03-21T09:18:25.845Z
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.846Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 path=/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/alloc/logs/.test-task-2.stderr.fifo @module=logmon timestamp=2024-03-21T09:18:25.846Z
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [INFO]  agent: (runner) creating new runner (dry: false, once: false)
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [DEBUG] agent: (runner) final config: {"Consul":{"Address":"127.0.0.1:8500","Namespace":"","Auth":{"Enabled":false,"Username":""},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"Token":"60673e58-b449-3276-5833-dcf395093f5c","TokenFile":"","Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000}},"Dedup":{"Enabled":false,"MaxStale":2000000000,"Prefix":"consul-template/dedup/","TTL":15000000000,"BlockQueryWaitTime":60000000000},"DefaultDelims":{"Left":null,"Right":null},"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":0},"KillSignal":2,"LogLevel":"WARN","FileLog":{"LogFilePath":"","LogRotateBytes":0,"LogRotateDuration":86400000000000,"LogRotateMaxFiles":0},"MaxStale":2000000000,"PidFile":"","ReloadSignal":1,"Syslog":{"Enabled":false,"Facility":"LOCAL0","Name":"consul-template"},"Templates":[{"Backup":false,"Command":[],"CommandTimeout":30000000000,"Contents":"\u003c?php\nprint \"{{ key \"var_consul\" }}\";\n?\u003e\n","CreateDestDirs":true,"Destination":"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/local/index.php","ErrMissingKey":false,"ErrFatal":true,"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":30000000000},"Perms":420,"User":null,"Uid":null,"Group":null,"Gid":null,"Source":"","Wait":{"Enabled":false,"Min":0,"Max":0},"LeftDelim":"{{","RightDelim":"}}","FunctionDenylist":["plugin","writeToFile"],"SandboxPath":"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2","MapToEnvironmentVariable":""}],"TemplateErrFatal":null,"Vault":{"Address":"http://10.101.14.30:8200","Enabled":true,"Namespace":"","RenewToken":false,"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":false},"Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"UnwrapToken":false,"ClientUserAgent":null,"DefaultLeaseDuration":300000000000,"LeaseRenewalThreshold":0.9,"K8SAuthRoleName":"","K8SServiceAccountTokenPath":"/run/secrets/kubernetes.io/serviceaccount/token","K8SServiceAccountToken":"","K8SServiceMountPath":"kubernetes"},"Nomad":{"Address":"","Enabled":true,"Namespace":"default","SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"AuthUsername":"","AuthPassword":"","Transport":{"CustomDialer":{},"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true}},"Wait":{"Enabled":false,"Min":0,"Max":0},"Once":false,"ParseOnly":false,"BlockQueryWaitTime":60000000000,"ErrOnFailedLookup":false}
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [INFO]  agent: (runner) creating watcher
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [INFO]  agent: (runner) starting
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [DEBUG] agent: (runner) running initial templates
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [DEBUG] agent: (runner) initiating run
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.892Z [DEBUG] agent: (runner) checking template 314b87fb017a9c4e5eaadece8b4ec85b
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) missing data for 1 dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) missing dependency: kv.block(var_consul)
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) add used dependency kv.block(var_consul) to missing since isLeader but do not have a watcher
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) was not watching 1 dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (watcher) adding kv.block(var_consul)
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) diffing and updating dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.893Z [DEBUG] agent: (runner) watching 1 dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.895Z [DEBUG] agent: (runner) receiving dependency kv.block(var_consul)
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.895Z [DEBUG] agent: (runner) initiating run
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.895Z [DEBUG] agent: (runner) checking template 314b87fb017a9c4e5eaadece8b4ec85b
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.895Z [DEBUG] agent: (runner) rendering "(dynamic)" => "/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/local/index.php"
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.924Z [INFO]  agent: (runner) rendered "(dynamic)" => "/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/local/index.php"
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.924Z [DEBUG] agent: (runner) diffing and updating dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.924Z [DEBUG] agent: (runner) kv.block(var_consul) is still needed
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.924Z [DEBUG] agent: (runner) watching 1 dependencies
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.924Z [DEBUG] agent: (runner) all templates rendered
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: image reference count incremented: driver=docker image_name=php:8.3.0-apache image_id=sha256:3bbddfeb929989067da15efacb0913e66756fc755a3e30c664ff64571590a4ee references=2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: configured resources: driver=docker task_name=test-task-2 memory=314572800 memory_reservation=0 cpu_shares=100 cpu_quota=0 cpu_period=0
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: binding directories: driver=docker task_name=test-task-2 binds="[]string{\"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/alloc:/alloc\", \"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/local:/local\", \"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/secrets:/secrets\", \"/home/pm/nomad/data/alloc/fbc348e0-a932-a653-fc87-9366b128eec5/test-task-2/local/index.php:/var/www/html/index.php\"}"
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: networking mode not specified; using default: driver=docker task_name=test-task-2
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: allocated static port: driver=docker task_name=test-task-2 ip=10.101.14.31 port=21412 label=traffic-port
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: exposed port: driver=docker task_name=test-task-2 port=21412 label=traffic-port
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: applied labels on the container: driver=docker task_name=test-task-2 labels=map[com.hashicorp.nomad.alloc_id:fbc348e0-a932-a653-fc87-9366b128eec5]
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.930Z [DEBUG] client.driver_mgr.docker: setting container name: driver=docker task_name=test-task-2 container_name=test-task-2-fbc348e0-a932-a653-fc87-9366b128eec5
Mar 21 09:18:25 devops-temp02 nomad[3723]:     2024-03-21T09:18:25.993Z [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=319b1f1eb40858c99b5721b7e7ce1de8d156a0159bf4005a08d9e3f598a92056
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.352Z [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=319b1f1eb40858c99b5721b7e7ce1de8d156a0159bf4005a08d9e3f598a92056
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.352Z [DEBUG] client.driver_mgr.docker.docker_logger: starting plugin: driver=docker path=/home/pm/nomad/bin/nomad args=["/home/pm/nomad/bin/nomad", "docker_logger"]
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.353Z [DEBUG] client.driver_mgr.docker.docker_logger: plugin started: driver=docker path=/home/pm/nomad/bin/nomad pid=107742
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.353Z [DEBUG] client.driver_mgr.docker.docker_logger: waiting for RPC address: driver=docker plugin=/home/pm/nomad/bin/nomad
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.414Z [DEBUG] client.driver_mgr.docker.docker_logger.nomad: plugin address: driver=docker @module=docker_logger address=/tmp/plugin39314117 network=unix timestamp=2024-03-21T09:18:26.413Z
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.414Z [DEBUG] client.driver_mgr.docker.docker_logger: using plugin: driver=docker version=2
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.416Z [DEBUG] client.driver_mgr.docker.docker_logger.nomad: using client connection initialized from environment: driver=docker @module=docker_logger timestamp=2024-03-21T09:18:26.416Z
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.421Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Started msg="Task started by client" failed=false
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.629Z [DEBUG] client: updated allocations: index=4970 total=3 pulled=0 filtered=3
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.629Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=3
Mar 21 09:18:26 devops-temp02 nomad[3723]:     2024-03-21T09:18:26.629Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=3 errors=0
Mar 21 09:18:43 devops-temp02 nomad[3723]:     2024-03-21T09:18:43.731Z [DEBUG] client: updated allocations: index=4972 total=3 pulled=0 filtered=3
Mar 21 09:18:43 devops-temp02 nomad[3723]:     2024-03-21T09:18:43.731Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=3
Mar 21 09:18:43 devops-temp02 nomad[3723]:     2024-03-21T09:18:43.731Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=3 errors=0
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.044Z [DEBUG] client: updated allocations: index=4976 total=3 pulled=1 filtered=2
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.044Z [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=2
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.047Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=2 errors=0
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.050Z [DEBUG] client: updated allocations: index=4977 total=4 pulled=2 filtered=2
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.050Z [DEBUG] client: allocation updates: added=1 removed=0 updated=1 ignored=2
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.057Z [WARN]  consul.sync: failed to update services in Consul: error="Unexpected response code: 403 (ACL not found)"
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.057Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.057Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 type=Received msg="Task received by client" failed=false
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.057Z [DEBUG] client: allocation updates applied: added=1 removed=0 updated=1 ignored=2 errors=0
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.059Z [DEBUG] client.alloc_runner.widmgr: retrieving and renewing workload identities: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 num_identities=3
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.070Z [DEBUG] client.alloc_runner.widmgr: waiting to renew identities: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 num=3 wait=31m45.398694275s
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.089Z [DEBUG] client.alloc_migrator: waiting for previous alloc to terminate: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 previous_alloc=fbc348e0-a932-a653-fc87-9366b128eec5
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.841Z [DEBUG] client: updated allocations: index=4979 total=4 pulled=0 filtered=4
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.841Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=4
Mar 21 09:19:03 devops-temp02 nomad[3723]:     2024-03-21T09:19:03.841Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=4 errors=0
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.405Z [INFO]  client.driver_mgr.docker: stopped container: container_id=319b1f1eb40858c99b5721b7e7ce1de8d156a0159bf4005a08d9e3f598a92056 driver=docker
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.414Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Terminated msg="Exit Code: 0" failed=false
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.415Z [DEBUG] client.driver_mgr.docker.docker_logger.stdio: received EOF, stopping recv loop: driver=docker err="rpc error: code = Unavailable desc = error reading from server: EOF"
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.420Z [INFO]  client.driver_mgr.docker.docker_logger: plugin process exited: driver=docker plugin=/home/pm/nomad/bin/nomad id=107742
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.420Z [DEBUG] client.driver_mgr.docker.docker_logger: plugin exited: driver=docker
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.431Z [DEBUG] client.driver_mgr.docker: image id reference count decremented: driver=docker image_id=sha256:3bbddfeb929989067da15efacb0913e66756fc755a3e30c664ff64571590a4ee references=1
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.431Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Killed msg="Task successfully killed" failed=false
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.439Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.stdio: received EOF, stopping recv loop: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 err="rpc error: code = Unavailable desc = error reading from server: EOF"
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.443Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 plugin=/home/pm/nomad/bin/nomad id=107630
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.443Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [INFO]  agent: (runner) stopping
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [DEBUG] agent: (runner) stopping watcher
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [DEBUG] agent: (watcher) stopping all views
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [INFO]  agent: (runner) received finish
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [INFO]  client.gc: marking allocation for GC: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.444Z [DEBUG] client.alloc_migrator: waiting for previous alloc to terminate: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 previous_alloc=fbc348e0-a932-a653-fc87-9366b128eec5
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.448Z [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.450Z [ERROR] client.alloc_runner: postrun failed: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5
Mar 21 09:19:04 devops-temp02 nomad[3723]:   error=
Mar 21 09:19:04 devops-temp02 nomad[3723]:   | hook "consul" failed: 1 error occurred:
Mar 21 09:19:04 devops-temp02 nomad[3723]:   | \t* 1 error occurred:
Mar 21 09:19:04 devops-temp02 nomad[3723]:   | \t* Unexpected response code: 403 (rpc error making call: ACL not found)
Mar 21 09:19:04 devops-temp02 nomad[3723]:   |
Mar 21 09:19:04 devops-temp02 nomad[3723]:   |
Mar 21 09:19:04 devops-temp02 nomad[3723]:   |
Mar 21 09:19:04 devops-temp02 nomad[3723]:   
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.450Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 type="Task Setup" msg="Building Task Directory" failed=false
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.460Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 path=/home/pm/nomad/bin/nomad args=["/home/pm/nomad/bin/nomad", "logmon"]
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.460Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 path=/home/pm/nomad/bin/nomad pid=107832
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.460Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 plugin=/home/pm/nomad/bin/nomad
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.504Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 @module=logmon address=/tmp/plugin1238857226 network=unix timestamp=2024-03-21T09:19:04.504Z
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.504Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 version=2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.506Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 @module=logmon path=/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/alloc/logs/.test-task-2.stdout.fifo timestamp=2024-03-21T09:19:04.505Z
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.506Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 @module=logmon path=/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/alloc/logs/.test-task-2.stderr.fifo timestamp=2024-03-21T09:19:04.506Z
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.549Z [INFO]  agent: (runner) creating new runner (dry: false, once: false)
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.549Z [DEBUG] agent: (runner) final config: {"Consul":{"Address":"127.0.0.1:8500","Namespace":"","Auth":{"Enabled":false,"Username":""},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"Token":"1dd892c0-eb72-e708-f1bb-2b8ca9a9d49a","TokenFile":"","Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000}},"Dedup":{"Enabled":false,"MaxStale":2000000000,"Prefix":"consul-template/dedup/","TTL":15000000000,"BlockQueryWaitTime":60000000000},"DefaultDelims":{"Left":null,"Right":null},"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":0},"KillSignal":2,"LogLevel":"WARN","FileLog":{"LogFilePath":"","LogRotateBytes":0,"LogRotateDuration":86400000000000,"LogRotateMaxFiles":0},"MaxStale":2000000000,"PidFile":"","ReloadSignal":1,"Syslog":{"Enabled":false,"Facility":"LOCAL0","Name":"consul-template"},"Templates":[{"Backup":false,"Command":[],"CommandTimeout":30000000000,"Contents":"\u003c?php\nprint \"{{ key \"var_consul\" }}\";\n?\u003e\n","CreateDestDirs":true,"Destination":"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/local/index.php","ErrMissingKey":false,"ErrFatal":true,"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":30000000000},"Perms":420,"User":null,"Uid":null,"Group":null,"Gid":null,"Source":"","Wait":{"Enabled":false,"Min":0,"Max":0},"LeftDelim":"{{","RightDelim":"}}","FunctionDenylist":["plugin","writeToFile"],"SandboxPath":"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2","MapToEnvironmentVariable":""}],"TemplateErrFatal":null,"Vault":{"Address":"http://10.101.14.30:8200","Enabled":true,"Namespace":"","RenewToken":false,"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":false},"Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"UnwrapToken":false,"ClientUserAgent":null,"DefaultLeaseDuration":300000000000,"LeaseRenewalThreshold":0.9,"K8SAuthRoleName":"","K8SServiceAccountTokenPath":"/run/secrets/kubernetes.io/serviceaccount/token","K8SServiceAccountToken":"","K8SServiceMountPath":"kubernetes"},"Nomad":{"Address":"","Enabled":true,"Namespace":"default","SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"AuthUsername":"","AuthPassword":"","Transport":{"CustomDialer":{},"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true}},"Wait":{"Enabled":false,"Min":0,"Max":0},"Once":false,"ParseOnly":false,"BlockQueryWaitTime":60000000000,"ErrOnFailedLookup":false}
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.550Z [INFO]  agent: (runner) creating watcher
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.550Z [INFO]  agent: (runner) starting
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.550Z [DEBUG] agent: (runner) running initial templates
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.550Z [DEBUG] agent: (runner) initiating run
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.550Z [DEBUG] agent: (runner) checking template 314b87fb017a9c4e5eaadece8b4ec85b
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.551Z [DEBUG] agent: (runner) missing data for 1 dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.551Z [DEBUG] agent: (runner) missing dependency: kv.block(var_consul)
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.551Z [DEBUG] agent: (runner) add used dependency kv.block(var_consul) to missing since isLeader but do not have a watcher
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.551Z [DEBUG] agent: (runner) was not watching 1 dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.552Z [DEBUG] agent: (watcher) adding kv.block(var_consul)
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.552Z [DEBUG] agent: (runner) diffing and updating dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.552Z [DEBUG] agent: (runner) watching 1 dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.553Z [DEBUG] agent: (runner) receiving dependency kv.block(var_consul)
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.554Z [DEBUG] agent: (runner) initiating run
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.554Z [DEBUG] agent: (runner) checking template 314b87fb017a9c4e5eaadece8b4ec85b
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.554Z [DEBUG] agent: (runner) rendering "(dynamic)" => "/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/local/index.php"
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.580Z [INFO]  agent: (runner) rendered "(dynamic)" => "/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/local/index.php"
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.580Z [DEBUG] agent: (runner) diffing and updating dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.580Z [DEBUG] agent: (runner) kv.block(var_consul) is still needed
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.580Z [DEBUG] agent: (runner) watching 1 dependencies
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.580Z [DEBUG] agent: (runner) all templates rendered
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.588Z [DEBUG] client.driver_mgr.docker: image reference count incremented: driver=docker image_name=php:8.3.0-apache image_id=sha256:3bbddfeb929989067da15efacb0913e66756fc755a3e30c664ff64571590a4ee references=2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.588Z [DEBUG] client.driver_mgr.docker: configured resources: driver=docker task_name=test-task-2 memory=314572800 memory_reservation=0 cpu_shares=100 cpu_quota=0 cpu_period=0
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: binding directories: driver=docker task_name=test-task-2 binds="[]string{\"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/alloc:/alloc\", \"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/local:/local\", \"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/secrets:/secrets\", \"/home/pm/nomad/data/alloc/416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8/test-task-2/local/index.php:/var/www/html/index.php\"}"
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: networking mode not specified; using default: driver=docker task_name=test-task-2
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: allocated static port: driver=docker task_name=test-task-2 ip=10.101.14.31 port=31989 label=traffic-port
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: exposed port: driver=docker task_name=test-task-2 port=31989 label=traffic-port
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: applied labels on the container: driver=docker task_name=test-task-2 labels=map[com.hashicorp.nomad.alloc_id:416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8]
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.589Z [DEBUG] client.driver_mgr.docker: setting container name: driver=docker task_name=test-task-2 container_name=test-task-2-416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.648Z [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=ca8377bbd79488a16a0f0f0a715abd3cb99585e4a22ca3871dff720d7b63d64f
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.701Z [DEBUG] client: updated allocations: index=4980 total=4 pulled=0 filtered=4
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.701Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=4
Mar 21 09:19:04 devops-temp02 nomad[3723]:     2024-03-21T09:19:04.702Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=4 errors=0
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.010Z [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=ca8377bbd79488a16a0f0f0a715abd3cb99585e4a22ca3871dff720d7b63d64f
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.010Z [DEBUG] client.driver_mgr.docker.docker_logger: starting plugin: driver=docker path=/home/pm/nomad/bin/nomad args=["/home/pm/nomad/bin/nomad", "docker_logger"]
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.011Z [DEBUG] client.driver_mgr.docker.docker_logger: plugin started: driver=docker path=/home/pm/nomad/bin/nomad pid=107942
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.011Z [DEBUG] client.driver_mgr.docker.docker_logger: waiting for RPC address: driver=docker plugin=/home/pm/nomad/bin/nomad
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.060Z [DEBUG] client.driver_mgr.docker.docker_logger.nomad: plugin address: driver=docker address=/tmp/plugin402562303 network=unix @module=docker_logger timestamp=2024-03-21T09:19:05.060Z
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.060Z [DEBUG] client.driver_mgr.docker.docker_logger: using plugin: driver=docker version=2
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.064Z [DEBUG] client.driver_mgr.docker.docker_logger.nomad: using client connection initialized from environment: driver=docker @module=docker_logger timestamp=2024-03-21T09:19:05.063Z
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.066Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=416f4a8b-a1c3-4c2a-0660-bb0c0244b7b8 task=test-task-2 type=Started msg="Task started by client" failed=false
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.161Z [DEBUG] client: updated allocations: index=4981 total=4 pulled=0 filtered=4
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.161Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=4
Mar 21 09:19:05 devops-temp02 nomad[3723]:     2024-03-21T09:19:05.161Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=4 errors=0
Mar 21 09:19:56 devops-temp02 nomad[3723]:     2024-03-21T09:19:56.477Z [ERROR] consul.sync: still unable to update services in Consul: failures=10 error="Unexpected response code: 403 (ACL not found)"
Mar 21 09:20:06 devops-temp02 nomad[3723]:     2024-03-21T09:20:06.329Z [DEBUG] client: received stale allocation information; retrying: index=4981 min_index=4981
Mar 21 09:21:05 devops-temp02 nomad[3723]:     2024-03-21T09:21:05.537Z [DEBUG] client: received stale allocation information; retrying: index=4981 min_index=4981
Mar 21 09:22:04 devops-temp02 nomad[3723]:     2024-03-21T09:22:04.981Z [DEBUG] client: received stale allocation information; retrying: index=4981 min_index=4981
Mar 21 09:22:17 devops-temp02 nomad[3723]:     2024-03-21T09:22:17.329Z [DEBUG] client.alloc_runner.widmgr: waiting to renew identities: alloc_id=fa9956ea-81a8-57e1-276f-b4820fc52ec0 num=3 wait=32m52.853496262s
Mar 21 09:22:31 devops-temp02 nomad[3723]:     2024-03-21T09:22:31.509Z [ERROR] consul.sync: still unable to update services in Consul: failures=20 error="Unexpected response code: 403 (ACL not found)"
Mar 21 09:23:04 devops-temp02 nomad[3723]:     2024-03-21T09:23:04.998Z [DEBUG] client: received stale allocation information; retrying: index=4981 min_index=4981
Mar 21 09:24:04 devops-temp02 nomad[3723]:     2024-03-21T09:24:04.592Z [DEBUG] client: received stale allocation information; retrying: index=4981 min_index=4981
Mar 21 09:24:05 devops-temp02 nomad[3723]:     2024-03-21T09:24:05.060Z [DEBUG] client: updated allocations: index=4987 total=4 pulled=0 filtered=4
Mar 21 09:24:05 devops-temp02 nomad[3723]:     2024-03-21T09:24:05.060Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=4
Mar 21 09:24:05 devops-temp02 nomad[3723]:     2024-03-21T09:24:05.060Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=4 errors=0

Consul Client Logs

For completeness' sake, here are the consul client logs from the agent on the same machine as the nomad client

Mar 21 02:56:50 devops-temp02 consul[890]: 2024-03-21T02:56:50.084Z [INFO]  agent: Newer Consul version available: new_version=1.18.0 current_version=1.17.3
Mar 21 09:18:23 devops-temp02 consul[890]: 2024-03-21T09:18:23.058Z [INFO]  agent: Deregistered service: service=_nomad-task-742bf378-0d3b-eaf7-7d11-5c832c28c7c0-group-test-group-2-test-service-2-traffic-port
Mar 21 09:18:25 devops-temp02 consul[890]: 2024-03-21T09:18:25.799Z [INFO]  agent: Synced service: service=_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port
Mar 21 09:18:33 devops-temp02 consul[890]: 2024-03-21T09:18:33.249Z [INFO]  agent: Synced check: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.056Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.056Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.251Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.262Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.263Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:03 devops-temp02 consul[890]: 2024-03-21T09:19:03.263Z [WARN]  agent: Check registration blocked by ACLs: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e accessorID=""
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.059Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.059Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.447Z [ERROR] agent.client: RPC failed to server: method=ACL.Logout server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.447Z [ERROR] agent.http: Request error: method=POST url=/v1/acl/logout from=127.0.0.1:57824 error="rpc error making call: ACL not found"
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.449Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:04 devops-temp02 consul[890]: 2024-03-21T09:19:04.449Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:07 devops-temp02 consul[890]: 2024-03-21T09:19:07.452Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:07 devops-temp02 consul[890]: 2024-03-21T09:19:07.452Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:11 devops-temp02 consul[890]: 2024-03-21T09:19:11.454Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:11 devops-temp02 consul[890]: 2024-03-21T09:19:11.454Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:13 devops-temp02 consul[890]: 2024-03-21T09:19:13.252Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:16 devops-temp02 consul[890]: 2024-03-21T09:19:16.457Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:16 devops-temp02 consul[890]: 2024-03-21T09:19:16.457Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:22 devops-temp02 consul[890]: 2024-03-21T09:19:22.460Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:22 devops-temp02 consul[890]: 2024-03-21T09:19:22.460Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:23 devops-temp02 consul[890]: 2024-03-21T09:19:23.253Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:29 devops-temp02 consul[890]: 2024-03-21T09:19:29.463Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:29 devops-temp02 consul[890]: 2024-03-21T09:19:29.463Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:33 devops-temp02 consul[890]: 2024-03-21T09:19:33.254Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:37 devops-temp02 consul[890]: 2024-03-21T09:19:37.467Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:37 devops-temp02 consul[890]: 2024-03-21T09:19:37.467Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:43 devops-temp02 consul[890]: 2024-03-21T09:19:43.255Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:46 devops-temp02 consul[890]: 2024-03-21T09:19:46.471Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:46 devops-temp02 consul[890]: 2024-03-21T09:19:46.472Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:19:53 devops-temp02 consul[890]: 2024-03-21T09:19:53.257Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:19:56 devops-temp02 consul[890]: 2024-03-21T09:19:56.477Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:19:56 devops-temp02 consul[890]: 2024-03-21T09:19:56.477Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:20:03 devops-temp02 consul[890]: 2024-03-21T09:20:03.258Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:20:07 devops-temp02 consul[890]: 2024-03-21T09:20:07.482Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:20:07 devops-temp02 consul[890]: 2024-03-21T09:20:07.482Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:20:13 devops-temp02 consul[890]: 2024-03-21T09:20:13.259Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:20:19 devops-temp02 consul[890]: 2024-03-21T09:20:19.485Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:20:19 devops-temp02 consul[890]: 2024-03-21T09:20:19.485Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:20:23 devops-temp02 consul[890]: 2024-03-21T09:20:23.260Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:20:25 devops-temp02 consul[890]: 2024-03-21T09:20:25.319Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:20:25 devops-temp02 consul[890]: 2024-03-21T09:20:25.320Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:20:25 devops-temp02 consul[890]: 2024-03-21T09:20:25.320Z [WARN]  agent: Check registration blocked by ACLs: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e accessorID=""
Mar 21 09:20:32 devops-temp02 consul[890]: 2024-03-21T09:20:32.487Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:20:32 devops-temp02 consul[890]: 2024-03-21T09:20:32.488Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:20:33 devops-temp02 consul[890]: 2024-03-21T09:20:33.260Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:20:43 devops-temp02 consul[890]: 2024-03-21T09:20:43.261Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:20:46 devops-temp02 consul[890]: 2024-03-21T09:20:46.490Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:20:46 devops-temp02 consul[890]: 2024-03-21T09:20:46.491Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:20:53 devops-temp02 consul[890]: 2024-03-21T09:20:53.263Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:01 devops-temp02 consul[890]: 2024-03-21T09:21:01.493Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:21:01 devops-temp02 consul[890]: 2024-03-21T09:21:01.493Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:21:03 devops-temp02 consul[890]: 2024-03-21T09:21:03.264Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:13 devops-temp02 consul[890]: 2024-03-21T09:21:13.265Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:17 devops-temp02 consul[890]: 2024-03-21T09:21:17.496Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:21:17 devops-temp02 consul[890]: 2024-03-21T09:21:17.497Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:21:23 devops-temp02 consul[890]: 2024-03-21T09:21:23.266Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:33 devops-temp02 consul[890]: 2024-03-21T09:21:33.268Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:34 devops-temp02 consul[890]: 2024-03-21T09:21:34.499Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:21:34 devops-temp02 consul[890]: 2024-03-21T09:21:34.500Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:21:43 devops-temp02 consul[890]: 2024-03-21T09:21:43.269Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:21:52 devops-temp02 consul[890]: 2024-03-21T09:21:52.502Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:21:52 devops-temp02 consul[890]: 2024-03-21T09:21:52.503Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:21:53 devops-temp02 consul[890]: 2024-03-21T09:21:53.270Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:03 devops-temp02 consul[890]: 2024-03-21T09:22:03.271Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:06 devops-temp02 consul[890]: 2024-03-21T09:22:06.391Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:22:06 devops-temp02 consul[890]: 2024-03-21T09:22:06.392Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:22:06 devops-temp02 consul[890]: 2024-03-21T09:22:06.392Z [WARN]  agent: Check registration blocked by ACLs: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e accessorID=""
Mar 21 09:22:11 devops-temp02 consul[890]: 2024-03-21T09:22:11.506Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:22:11 devops-temp02 consul[890]: 2024-03-21T09:22:11.506Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:22:13 devops-temp02 consul[890]: 2024-03-21T09:22:13.272Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:23 devops-temp02 consul[890]: 2024-03-21T09:22:23.273Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:31 devops-temp02 consul[890]: 2024-03-21T09:22:31.509Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:22:31 devops-temp02 consul[890]: 2024-03-21T09:22:31.509Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:22:33 devops-temp02 consul[890]: 2024-03-21T09:22:33.274Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:43 devops-temp02 consul[890]: 2024-03-21T09:22:43.276Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:22:52 devops-temp02 consul[890]: 2024-03-21T09:22:52.511Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:22:52 devops-temp02 consul[890]: 2024-03-21T09:22:52.511Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:22:53 devops-temp02 consul[890]: 2024-03-21T09:22:53.278Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:03 devops-temp02 consul[890]: 2024-03-21T09:23:03.279Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:13 devops-temp02 consul[890]: 2024-03-21T09:23:13.280Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:14 devops-temp02 consul[890]: 2024-03-21T09:23:14.513Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:23:14 devops-temp02 consul[890]: 2024-03-21T09:23:14.513Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:23:23 devops-temp02 consul[890]: 2024-03-21T09:23:23.281Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:33 devops-temp02 consul[890]: 2024-03-21T09:23:33.282Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:37 devops-temp02 consul[890]: 2024-03-21T09:23:37.517Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:23:37 devops-temp02 consul[890]: 2024-03-21T09:23:37.517Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:23:43 devops-temp02 consul[890]: 2024-03-21T09:23:43.283Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:23:53 devops-temp02 consul[890]: 2024-03-21T09:23:53.284Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:01 devops-temp02 consul[890]: 2024-03-21T09:24:01.520Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:24:01 devops-temp02 consul[890]: 2024-03-21T09:24:01.520Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:24:03 devops-temp02 consul[890]: 2024-03-21T09:24:03.286Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:04 devops-temp02 consul[890]: 2024-03-21T09:24:04.171Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:24:04 devops-temp02 consul[890]: 2024-03-21T09:24:04.172Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:24:04 devops-temp02 consul[890]: 2024-03-21T09:24:04.172Z [WARN]  agent: Check registration blocked by ACLs: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e accessorID=""
Mar 21 09:24:13 devops-temp02 consul[890]: 2024-03-21T09:24:13.288Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:23 devops-temp02 consul[890]: 2024-03-21T09:24:23.288Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:26 devops-temp02 consul[890]: 2024-03-21T09:24:26.522Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:24:26 devops-temp02 consul[890]: 2024-03-21T09:24:26.523Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:24:33 devops-temp02 consul[890]: 2024-03-21T09:24:33.289Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:43 devops-temp02 consul[890]: 2024-03-21T09:24:43.291Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:24:52 devops-temp02 consul[890]: 2024-03-21T09:24:52.525Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:24:52 devops-temp02 consul[890]: 2024-03-21T09:24:52.525Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:24:53 devops-temp02 consul[890]: 2024-03-21T09:24:53.292Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:03 devops-temp02 consul[890]: 2024-03-21T09:25:03.293Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:06 devops-temp02 consul[890]: 2024-03-21T09:25:06.351Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"
Mar 21 09:25:06 devops-temp02 consul[890]: 2024-03-21T09:25:06.351Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:25:06 devops-temp02 consul[890]: 2024-03-21T09:25:06.351Z [WARN]  agent: Check registration blocked by ACLs: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e accessorID=""
Mar 21 09:25:13 devops-temp02 consul[890]: 2024-03-21T09:25:13.294Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:19 devops-temp02 consul[890]: 2024-03-21T09:25:19.528Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:25:19 devops-temp02 consul[890]: 2024-03-21T09:25:19.529Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:25:23 devops-temp02 consul[890]: 2024-03-21T09:25:23.295Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:33 devops-temp02 consul[890]: 2024-03-21T09:25:33.296Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:43 devops-temp02 consul[890]: 2024-03-21T09:25:43.297Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:25:47 devops-temp02 consul[890]: 2024-03-21T09:25:47.532Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:25:47 devops-temp02 consul[890]: 2024-03-21T09:25:47.533Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:25:53 devops-temp02 consul[890]: 2024-03-21T09:25:53.299Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:03 devops-temp02 consul[890]: 2024-03-21T09:26:03.300Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:13 devops-temp02 consul[890]: 2024-03-21T09:26:13.301Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:16 devops-temp02 consul[890]: 2024-03-21T09:26:16.536Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:26:16 devops-temp02 consul[890]: 2024-03-21T09:26:16.536Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
Mar 21 09:26:23 devops-temp02 consul[890]: 2024-03-21T09:26:23.302Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:33 devops-temp02 consul[890]: 2024-03-21T09:26:33.303Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:43 devops-temp02 consul[890]: 2024-03-21T09:26:43.304Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
Mar 21 09:26:46 devops-temp02 consul[890]: 2024-03-21T09:26:46.539Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
Mar 21 09:26:46 devops-temp02 consul[890]: 2024-03-21T09:26:46.539Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"

It intrigues me that the Consul Client log states that the check _nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e is critical, however in the Consul UI it shows as healthy. It's not possible for the check to be healthy, since that allocation has been terminated, and the new service was assigned a different dynamic port.

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 22, 2024

Thanks for the extra info.

Another thing to check is to make sure the tokens created with the workload identity really don't have an expiration date. You can use the consul acl token list command and the WI tokens from Nomad would have something like Description: token created via login: {"requested_by":"nomad_service_redis"}. The should not have an expiration date.

@Legogris
Copy link

Legogris commented Mar 23, 2024

Seeing similar behavior, except:

  • We're not using Workload Identities and never did
  • nomad v1.6.9
  • consul v1.16.6
  • Jobs are mix of Consul Connect and vanilla Consul + Nomad.

Also seeing logs like this:

[WARN]  agent: Service registration blocked by ACLs: service=_nomad-task-deadbeef accessorID=""
[ERROR] agent.client: RPC failed to server: method=Catalog.Register server=consul-server:8300 error="rpc error making call: rpc error making call: ACL not found"
[ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=consul-server:8300 error="rpc error making call: token does not exist: ACL not found"
[WARN]  agent: Service registration blocked by ACLs: service=_nomad-task-deadbeef accessorID=""

The services and other consul interaction for the nomad client itself work fine and the configured tokens (nomad, consul client default, consul client agent) do have sufficient ACLs. Service registration for individual nomad jobs will not work at all, however, and result in the above error messages. Also noting the accessorID="".

We have had a long-standing (years) issue of similar behavior where this would start happen after a while. This has never been resolved properly but was addressed by scanning logs for the error and then restarting the nomad process when it happens (this would return service registration capabilities and a SIGHUP or other measures we tried would not suffice; since we're templating in certs and tokens from Vault, solving this reliably requires #4593 and #3885).

What is now new as of last week (existing job definition that worked fine previoulsy, on newly provisioned clients with config that is identical to and working fine on other clients in the same cluster) is that we now have jobs/clients which persistently fail like above from the get-go and this does not get resolved by restarts of either consul or nomad clients and we never seem to get those failing instances working at all.

These issues look like they could be related:

@JoaoPPinto
Copy link
Author

| Another thing to check is to make sure the tokens created with the workload identity really don't have an expiration date.
Just to confirm that the tokens do not have expiration dates.

@EugenKon
Copy link

EugenKon commented Apr 2, 2024

I have similar issue.

root@ip-172-31-18-211:/home/ubuntu# nomad --version
Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e
root@ip-172-31-18-211:/home/ubuntu# consul --version
Consul v1.18.0
Revision 349cec17
Build Date 2024-02-26T22:05:50Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

I noticed this issue when rebootsrapped nomad.

Name             Type        Global  Accessor ID                           Expired
read-token       client      false   cf439f7c-16da-0411-46d7-a8f6d721cb0f  false
Bootstrap Token  management  true    58e2bce9-9b20-243c-f326-ceafb05eee25  false

Here was two "Bootstrap Token"s. I deleted the old one.

Apr 02 20:36:46 ip-172-31-18-211 consul[815]: 2024-04-02T20:36:46.152Z [TRACE] agent.server: rpc_server_call: method=Coordinate.Update errored=true request_type=write rpc_type=net/rpc leader=true target_datacenter=dc1 locality=local
Apr 02 20:36:46 ip-172-31-18-211 consul[815]: agent.server: rpc_server_call: method=Coordinate.Update errored=true request_type=write rpc_type=net/rpc leader=true target_datacenter=dc1 locality=local
Apr 02 20:36:46 ip-172-31-18-211 consul[815]: 2024-04-02T20:36:46.152Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
Apr 02 20:36:46 ip-172-31-18-211 consul[815]: agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
Apr 02 20:36:46 ip-172-31-18-211 consul[815]: 2024-04-02T20:36:46.710Z [DEBUG] agent: Check status updated: check=_nomad-check-13d1a457634299a56c70ccc3c802305bda29c3f2 status=passing
...
Apr 02 20:37:02 ip-172-31-18-211 consul[815]: 2024-04-02T20:37:02.142Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=172.31.18.211:41352 error="Permission denied: anonymous token lacks permission 'key:read' on \"vault/core/migration\". The anonymous token is used implicitly when a request does not specify a token."
Apr 02 20:37:02 ip-172-31-18-211 consul[815]: agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=172.31.18.211:41352 error="Permission denied: anonymous token lacks permission 'key:read' on \"vault/core/migration\". The anonymous token is used implicitly when a request does not specify a token."

After each nomad job stop nomad job plan nomad job run the list of registered services just grows up (it should be one here):
image

@JoaoPPinto
Copy link
Author

I noticed today when the issue manifested itself, that when nomad was trying to deregister a service (and failling), the healthcheck for that service was still showing as healthy several hours after the allocation was replaced (and thus impossible to be healthy). The consul client does throw a warning that the check is critical, but in the UI there is no error indication. I suppose that if anything this is a different issue in the consul repo, but I figured I should throw this info here.

@heipei
Copy link

heipei commented Jun 6, 2024

We are experiencing a similar problem: A single nomad client + consul client agent consistently loses "sync" with the rest of cluster somehow and as a result nomad stops being able to write to the consul agent. The node this is running one has above-average latency and occasional packet loss to the rest of the cluster for what it's worth.

This is what we're seeing:

nomad[1172471]:     2024-06-06T07:57:07.668Z [ERROR] consul.sync: still unable to update services in Consul: failures=6060 error="Unexpected response code: 403 (Permission denied: token with AccessorID '8f81763c-xxx' lacks permission 'service:write' on \"xxx\")"

Only a restart of consul + nomad allows the nomad client to once again write to the consul client. Any help would be appreciated.

  • Consul 1.17.3
  • Nomad 1.7.5

@jorgemarey
Copy link
Contributor

Hi, we're testing workload identities and have encountering this also:

  • Consul 1.18.2
  • Nomad 1.7.7

We have set the consul auth method TTL to 24h.

We started to see problems to sync with consul from nomad in the client logs. Going to the point where it starts to fail. We can see this on the client:

Jun  5 21:49:01 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:01.588994Z","err":"stream reset requested","failure_count":1,"key":"kernel-redis","topic":2}
Jun  5 21:49:01 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:01.589520Z","err":"stream reset requested","failure_count":1,"key":"kernel-crm","topic":2}
Jun  5 21:49:01 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:01.591784Z","err":"stream reset requested","failure_count":1,"key":"kernel-mongodb","topic":2}
Jun  5 21:49:01 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:01.592003Z","err":"stream reset requested","failure_count":1,"key":"kernel-directory","topic":2}
Jun  5 21:49:01 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:01.592020Z","err":"stream reset requested","failure_count":1,"key":"kernel-liam","topic":2}
Jun  5 21:49:06 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:06.123711Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":2,"key":"kernel-mongodb","topic":2}
Jun  5 21:49:06 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"warn","@message":"handling error in Store.Notify","@module":"agent.viewstore","@timestamp":"2024-06-05T21:49:06.123753Z","error":"rpc error: code = Unknown desc = ACL not found","index":1401393192,"request-type":"agent.rpcclient.health.serviceRequest"}
Jun  5 21:49:06 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Data source in an irrecoverable state; exiting","@module":"agent.proxycfg","@timestamp":"2024-06-05T21:49:06.123915Z","correlationID":"upstream-target:kernel-mongodb.default.default.play-es-1:default/default/kernel-mongodb","error":"rpc error: code = Unknown desc = ACL not found","kind":"connect-proxy","proxy":{"ID":"_nomad-task-019bb0b9-5978-9185-3a29-487797c8837a-group-apigw-kernel-apigw-http-sidecar-proxy","Partition":"default","Namespace":"default","NodeName":"wrkr-58e81a8f-b8aa","Token":""},"service_id":"default/default/_nomad-task-019bb0b9-5978-9185-3a29-487797c8837a-group-apigw-kernel-apigw-http-sidecar-proxy"}
Jun  5 21:49:06 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:06.124030Z","error":"rpc error: code = Aborted desc = xDS stream terminated due to an irrecoverable error, please try again","xdsVersion":"v3"}
Jun  5 21:49:06 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:06.573042Z","error":"rpc error: code = Unauthenticated desc = unauthenticated: ACL not found","xdsVersion":"v3"}
Jun  5 21:49:07 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:07.058629Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":2,"key":"kernel-liam","topic":2}
Jun  5 21:49:07 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:07.201426Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":2,"key":"kernel-redis","topic":2}
Jun  5 21:49:07 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:07.353640Z","error":"rpc error: code = Unauthenticated desc = unauthenticated: ACL not found","xdsVersion":"v3"}
Jun  5 21:49:07 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:07.626305Z","error":"rpc error: code = Unauthenticated desc = unauthenticated: ACL not found","xdsVersion":"v3"}
Jun  5 21:49:08 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:08.441435Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":2,"key":"kernel-crm","topic":2}
Jun  5 21:49:09 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:09.092131Z","error":"rpc error: code = Unauthenticated desc = unauthenticated: ACL not found","xdsVersion":"v3"}
Jun  5 21:49:09 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Request error","@module":"agent.http","@timestamp":"2024-06-05T21:49:09.312529Z","error":"ACL not found","from":"127.0.0.1:58596","method":"PUT","url":"/v1/agent/check/update/_nomad-check-9d5c7d66ce96a9092c93202d558873ba326f96dc"}
Jun  5 21:49:09 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:09.488017Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":3,"key":"kernel-redis","topic":2}
Jun  5 21:49:09 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Error handling ADS delta stream","@module":"agent.envoy","@timestamp":"2024-06-05T21:49:09.565366Z","error":"rpc error: code = Unauthenticated desc = unauthenticated: ACL not found","xdsVersion":"v3"}
Jun  5 21:49:09 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:09.625739Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":2,"key":"kernel-directory","topic":2}
Jun  5 21:49:10 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"Request error","@module":"agent.http","@timestamp":"2024-06-05T21:49:10.315130Z","error":"ACL not found","from":"127.0.0.1:58596","method":"PUT","url":"/v1/agent/check/update/_nomad-check-9d5c7d66ce96a9092c93202d558873ba326f96dc"}
Jun  5 21:49:10 wrkr-58e81a8f-b8aa nomad[1413]: {"@level":"warn","@message":"updating check failed","@module":"client.alloc_runner.task_runner.task_hook.script_checks","@timestamp":"2024-06-05T21:49:10.315619Z","alloc_id":"019bb0b9-5978-9185-3a29-487797c8837a","error":"Unexpected response code: 403 (ACL not found)","task":"apigw"}
Jun  5 21:49:11 wrkr-58e81a8f-b8aa consul[1366]: {"@level":"error","@message":"subscribe call failed","@module":"agent.rpcclient.health","@timestamp":"2024-06-05T21:49:11.516362Z","err":"rpc error: code = Unknown desc = ACL not found","failure_count":3,"key":"kernel-liam","topic":2}

It matches with this event on the consul server:

Jun  5 21:49:01 ctrl-7d7d2d05-2g consul[1264]: {"@level":"info","@message":"deleting expired ACL tokens","@module":"agent.server","@timestamp":"2024-06-05T21:49:01.501035Z","amount":1,"locality":"local"}

I guess after the 24h mark from where the allocation started (the token derived using the jwt), consul deletes the token and nomad starts complaining about the consul sync as now it's using that token instead of the configuration token to perform registrations.

Looking at the code I saw that consul tokens are only derived in the alloc Prerun() (if I'm not mistaken). So I guess that during the lifetime of the allocation a new token it's not derived.

@tgross
Copy link
Member

tgross commented Oct 9, 2024

Hey folks, just a heads up that I'm picking this back up.

Just wanted to point out to folks that if your Consul Auth Method has a TTL then that is not the issue described here but instead is #20185. We do not currently expect that the tokens being issues for Workload Identities have a TTL (but I will fix that as well).

This issue is for the case where you definitely don't have a TTL on the Auth Method and aren't otherwise deleting the token out-of-band somehow, and we don't yet know why that is. I'll report back soon on the results of more investigation.

@tgross
Copy link
Member

tgross commented Oct 9, 2024

I'm digging through the logs posted and the logs on the client illustrate the problem nicely. We see the allocation 742bf378 get stopped and consul.sync successfully deregister its services, then the allocation fbc348e0 starts and we see consul.sync register its services.

2024-03-21T09:18:23.052Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=742bf378-0d3b-eaf7-7d11-5c832c28c7c0 task=test-task-2 type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
2024-03-21T09:18:23.059Z [DEBUG] consul.sync: sync complete: registered_services=0 deregistered_services=1 registered_checks=0 deregistered_checks=0
...
2024-03-21T09:18:25.764Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Received msg="Task received by client" failed=false
...
2024-03-21T09:18:25.801Z [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=0 deregistered_checks=0
...
2024-03-21T09:18:26.421Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Started msg="Task started by client" failed=false

Then only 37s later we see the client gets an updated version of the alloc from the server and stops it. At that point we get the failed consul.sync:

2024-03-21T09:19:03.044Z [DEBUG] client: updated allocations: index=4976 total=3 pulled=1 filtered=2
2024-03-21T09:19:03.044Z [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=2
2024-03-21T09:19:03.047Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=2 errors=0
2024-03-21T09:19:03.050Z [DEBUG] client: updated allocations: index=4977 total=4 pulled=2 filtered=2
2024-03-21T09:19:03.050Z [DEBUG] client: allocation updates: added=1 removed=0 updated=1 ignored=2
2024-03-21T09:19:03.057Z [WARN]  consul.sync: failed to update services in Consul: error="Unexpected response code: 403 (ACL not found)"
2024-03-21T09:19:03.057Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=fbc348e0-a932-a653-fc87-9366b128eec5 task=test-task-2 type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false

If we look at the Consul agent logs from the same time period, we can see we're deregistering the service for fbc348e0.

2024-03-21T09:18:33.249Z [INFO]  agent: Synced check: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
2024-03-21T09:19:03.056Z [ERROR] agent.client: RPC failed to server: method=ACL.TokenRead server=10.101.14.30:8300 error="rpc error making call: token does not exist: ACL not found"
2024-03-21T09:19:03.056Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/_nomad-task-fbc348e0-a932-a653-fc87-9366b128eec5-group-test-group-2-test-service-2-traffic-port from=127.0.0.1:45160 error="ACL not found"
2024-03-21T09:19:03.251Z [WARN]  agent: Check is now critical: check=_nomad-check-faed31a621ed7967b47dc60b89ab0eff14ad085e
2024-03-21T09:19:03.262Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.101.14.30:8300 error="rpc error making call: ACL not found"

I dug in from there and found that the client gets "stuck" like this thereafter because consul.sync bails out on any error. Once we have a missing token we're never going to recover even for all the other services! Deregistrations happen before registrations, so a single broken deregistration will prevent any new registrations, and cascade down to Envoy bootstrap for Consul Service Mesh workloads. The work we did in #23381 can't help because the service will never get registered. So that's got to get fixed.

That still doesn't explain where the token went in the first place!

I suspect that at least for some folks reporting here that they don't have the Consul agent token set with appropriate permissions, as described here: #23800. But I'm sure there are other cases as well. Still investigating...

@tgross
Copy link
Member

tgross commented Oct 10, 2024

Because I just got an internal message on this issue that was confused on the subject again: if you think you're hitting this, check the auth method you've configured in Consul. It should look something like this and not have a field like MaxTokenTTL: 1h0m0s.

$ consul acl auth-method read -name nomad-workloads
Name:          nomad-workloads
Type:          jwt
Partition:     default
Namespace:     default
Description:   auth/nomad-workloads (jwt)
Config:
{
  "BoundAudiences": [
    "consul.io"
  ],
  "ClaimMappings": {
    "consul_namespace": "consul_namespace",
    "nomad_job_id": "nomad_job_id",
    "nomad_namespace": "nomad_namespace",
    "nomad_service": "nomad_service",
    "nomad_task": "nomad_task"
  },
  "JWKSURL": "https://nomad.example.com:4646/.well-known/jwks.json",
  "JWTSupportedAlgs": [
    "RS256"
  ]
}

Nomad cannot currently work with MaxTokenTTL set (that's #20185, which may not be solvable at all for connect without blowing away the whole Envoy proxy)

@tgross
Copy link
Member

tgross commented Oct 10, 2024

After an extended discussion with the Consul folks to make sure what we were doing was reasonable, I've identified two core problems with the current workflow:

  • We can't use the service identity token reliably for deregistering services because of a race between deregistration and client GC. So even if there is some other bug lurking we haven't identified, using the SI token is a no-go short of a full rearchitecture of the Consul sync we do. So we're going to move all deregistration code paths back to using the Nomad agent's own token.
  • As noted above, if there's any failure the entire sync fails. We can fix that by accumulating errors and attempting to push forward with the sync.

Both of these fixes are in #24166. We missed the boat on the 1.9.0 release train, but this will ship in 1.9.1 (with Enterprise backports). I've also added documentation around the auth method TTL in #24167.

@philek
Copy link

philek commented Oct 11, 2024

I would like to clarify one point. When using the Workload Identity workflow for consul, does the nomad server policy still require acl = "write"?

@tgross
Copy link
Member

tgross commented Oct 11, 2024

@philek we've left that in there until 1.10 LTS ships because anyone doing an upgrade to WI will still need acl = "write" across that upgrade. If you've only ever used WI, it should be safe to remove. Test in pre-prod first.

tgross added a commit that referenced this issue Oct 11, 2024
When the local Consul agent receives a deregister request, it performs a
pre-flight check using the locally cached ACL token. The agent then sends the
request upstream to the Consul servers as part of anti-entropy, using its own
token. This requires that the token we use for deregistration is valid even
though that's not the token used to write to the Consul server.

There are several cases where the service identity token might no longer exist
at the time of deregistration:
* A race condition between the sync and destroying the allocation.
* Misconfiguration of the Consul auth method with a TTL.
* Out-of-band destruction of the token.

Additionally, Nomad's sync with Consul returns early if there are any errors,
which means that a single broken token can prevent any other service on the
Nomad agent from being registered or deregistered.

Update Nomad's sync with Consul to use the Nomad agent's own Consul token for
deregistration, regardless of which token the service was registered
with. Accumulate errors from the sync so that they no longer block
deregistration of other services.

Fixes: #20159
@tgross tgross closed this as completed in 4de1665 Oct 11, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Nomad - Community Issues Triage Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

8 participants