Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot fetch logs for tasks in Nomad UI #8136

Closed
rberlind opened this issue Jun 9, 2020 · 11 comments
Closed

Cannot fetch logs for tasks in Nomad UI #8136

rberlind opened this issue Jun 9, 2020 · 11 comments

Comments

@rberlind
Copy link
Contributor

rberlind commented Jun 9, 2020

Nomad version

Nomad v0.11.2 (807cfeb)

Operating system and Environment details

Ubuntu 18.04

Issue

The Nomad UI cannot fetch allocation logs any more. It used to be able to do so. The problem started after upgrading to Nomad 0.11.x.

Reproduction steps

Run https://play.instruqt.com/hashicorp/tracks/nomad-simple-cluster and try to view logs for redis task of standard example.nomad job in Nomad UI.

Job file (if appropriate)

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port  "db"  {}
        }
      }
    }
  }
}

Nomad Client logs (if appropriate)

If possible please post relevant logs in the issue.
==> Loaded configuration from client1.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

   Advertise Addrs: HTTP: 10.132.0.248:4646
        Bind Addrs: HTTP: 0.0.0.0:4646
            Client: true
         Log Level: INFO
            Region: global (DC: dc1)
            Server: false
           Version: 0.11.2

==> Nomad agent started! Log data will stream in below:

2020-06-09T15:29:22.414Z [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/tmp/nomad/client1/plugins
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2020-06-09T15:29:22.422Z [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
2020-06-09T15:29:22.423Z [INFO]  client: using state directory: state_dir=/tmp/nomad/client1/client
2020-06-09T15:29:22.444Z [INFO]  client: using alloc directory: alloc_dir=/tmp/nomad/client1/alloc
2020-06-09T15:29:22.447Z [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
2020-06-09T15:29:22.450Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/sbin/ethtool device=ens4
2020-06-09T15:29:22.477Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
2020-06-09T15:29:22.479Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
2020-06-09T15:29:22.480Z [INFO]  client.plugin: starting plugin manager: plugin-type=device
2020-06-09T15:29:22.559Z [INFO]  client: started client: node_id=be336876-453c-6fb3-ffe5-181875bb53a6

2020-06-09T15:29:22.566Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: dial tcp 10.132.0.249:4647: connect: connection refused" rpc=Node.Register server=10.132.0.249:4647
2020-06-09T15:29:22.567Z [ERROR] client: error registering: error="rpc error: failed to get conn: dial tcp 10.132.0.249:4647: connect: connection refused"
2020-06-09T15:29:22.567Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: rpc error: lead thread didn't get connection" rpc=Node.GetClientAllocsserver=10.132.0.249:4647
2020-06-09T15:29:22.567Z [ERROR] client: error querying node allocations: error="rpc error: failed to get conn: rpc error: lead thread didn't get connection"
2020-06-09T15:29:39.629Z [INFO]  client: node registration complete
2020-06-09T15:29:39.702Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=fc0baa2d-283c-76d8-317d-ad799d683023 task=redis @module=logmon path=/tmp/nomad/client1/alloc/fc0baa2d-283c-76d8-317d-ad799d683023/alloc/logs/.redis.stdout.fifo timestamp=2020-06-09T15:29:39.702Z
2020-06-09T15:29:39.703Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=fc0baa2d-283c-76d8-317d-ad799d683023 task=redis @module=logmon path=/tmp/nomad/client1/alloc/fc0baa2d-283c-76d8-317d-ad799d683023/alloc/logs/.redis.stderr.fifo timestamp=2020-06-09T15:29:39.703Z

==> Newer Nomad version available: 0.11.3 (currently running: 0.11.2)
2020-06-09T15:29:45.511Z [INFO] client.driver_mgr.docker: created container: driver=docker container_id=893fc4311350d50154ceb4cee46f3144cb426ac5ec6036934ec7ed5129712398
2020-06-09T15:29:45.824Z [INFO] client: node registration complete
2020-06-09T15:29:46.399Z [INFO] client.driver_mgr.docker: started container: driver=docker container_id=893fc4311350d50154ceb4cee46f3144cb426ac5ec6036934ec7ed5129712398

Nomad Server logs (if appropriate)

I did see the following at the end of the server log:
2020-06-09T15:29:38.649Z [ERROR] http: request failed: method=GET path=/v1/namespaces error
="Nomad Enterprise only endpoint" code=501 2020-06-09T15:29:42.130Z [ERROR] http: request failed: method=GET path=/v1/namespace/defaul
t error="Nomad Enterprise only endpoint" code=501
2020-06-09T15:36:52.629Z [ERROR] http: request failed: method=GET path=/v1/namespaces error
="Nomad Enterprise only endpoint" code=501

@rberlind
Copy link
Contributor Author

rberlind commented Jun 9, 2020

Here is a screenshot.
NomadUI-AllocationLog-Error

@DingoEatingFuzz
Copy link
Contributor

Can you reproduce this outside of Instruqt?

@DingoEatingFuzz
Copy link
Contributor

Also, is this fixed in 0.11.3?

@rberlind
Copy link
Contributor Author

rberlind commented Jun 9, 2020

@DingoEatingFuzz : I have not tried outside Instruqt yet and have not yet tried Nomad 0.11.3. But are you aware of changes in 0.11.3 that would have affected display of logs in the Nomad UI.

@rberlind
Copy link
Contributor Author

@DingoEatingFuzz : I tested in Instruqt with Nomad 0.11.3 and still cannot access allocation/task logs in the Nomad UI. It would require more effort for me to test outside Instruqt. But you should be able to easily test with any Nomad 0.11.x cluster of your own by running any job. Note that I tested today with this track, https://play.instruqt.com/hashicorp/tracks/nomad-consul-connect, that is based on https://www.nomadproject.io/docs/integrations/consul-connect/#nomad-with-consul-connect-integration. So, perhaps you could test yourself by following that guide?

@chenjpu
Copy link

chenjpu commented Jun 23, 2020

As I tried out the v0.12.0-beta1 version, there was a log display problem. On version 0.11.3, there is no problem
image

@rberlind
Copy link
Contributor Author

I believe my problem was caused by using Instruqt. What appeared to be happening was that the Nomad UI was using a private IP within Instruqt that corresponded to the Nomad client that the allocation was running on and could not access that IP. The reason is that in Instruqt the various tabs go through a proxy and are effectively "outside" the Instruqt network.

I have not had a chance to test outside Instruqt. But I'm glad to hear from @chenjpu that they did see allocation logs in Nomad 0.11.3. It's unfortunate that Nomad 0.120-beta 1 is not working for you. I suggest reporting this as a separate issue with release set to Nomad 0.12.0-beta1, so that the Nomad engineering team knows your issue is specific to the beta.

@DingoEatingFuzz
Copy link
Contributor

@rberlind I'm glad you got to the bottom of that!

@chenjpu This is a known issue with the current beta (see known issues here). I'm sorry you ran into it, but we're working on a beta2 which will include that fix among other things!

@chenjpu
Copy link

chenjpu commented Jun 24, 2020

@DingoEatingFuzz Thanks for your reply, I already know. :)

@DingoEatingFuzz
Copy link
Contributor

Since the Instruqt issue ended up being unrelated and the 0.12.0-beta2 and 0.12.0-rc builds have fixed log streaming, I'm closing this.

@github-actions
Copy link

github-actions bot commented Nov 5, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants