Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman doesn't pull images on nomad job schedule #90

Closed
ecky-l opened this issue Feb 16, 2021 · 6 comments
Closed

Podman doesn't pull images on nomad job schedule #90

ecky-l opened this issue Feb 16, 2021 · 6 comments

Comments

@ecky-l
Copy link

ecky-l commented Feb 16, 2021

Hi,

I have a fresh podman installation on fedora coreos, with no local images. plugin is correctly configured and I try to run the example.nomad job (which is generated by nomad job init) with minor modifications:

...
task "redis" {
    driver = "podman"
    config {
        image = "docker.io/library/redis:4.0"
        # no ports
    }
...
}

When running nomad job run example.nomad, the job is schedued but fails to start. But it starts when I run podman pull docker.io/library/redis:4.0 on the nomad host. So apparently the image is not automatically pulled if it doesn't exist.

Is this a misconfiguration on my side or a bug?

@towe75
Copy link
Collaborator

towe75 commented Feb 16, 2021

@ecky-l can you add a few more details please? What's your podman version, plugin version? Does the (debug) log reveal something useful?

@ecky-l
Copy link
Author

ecky-l commented Feb 16, 2021

Sure:

[core@n1 ~]$ podman version
Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.5
Built:        Tue Dec  8 14:37:50 2020
OS/Arch:      linux/amd64
[core@n1 ~]$ nomad version
Nomad v1.0.3 (08741d9f2003ec26e44c72a2c0e27cdf0eadb6ee)

Plugin Version is 0.2.0.

[core@n1 ~]$ cat /etc/os-release 
NAME=Fedora
VERSION="33.20210117.3.2 (CoreOS)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora CoreOS 33.20210117.3.2"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:33"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=33
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=33
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='33.20210117.3.2'

Below is an extensive debug log of the nomad client whe I try to start the job with no image present. nomad status example reports "failed" for the allocations.

Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.177Z [DEBUG] client: updated allocations: index=817 total=57 pulled=32 filtered=25
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: updated allocations: index=817 total=57 pulled=32 filtered=25
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.178Z [DEBUG] client: allocation updates: added=1 removed=0 updated=31 ignored=25
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: allocation updates: added=1 removed=0 updated=31 ignored=25
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.344Z [INFO]  client.gc: garbage collecting allocation: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b reason="new allocations and over max (50)"
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.gc: garbage collecting allocation: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b reason="new allocations and over max (50)"
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.354Z [INFO]  client.gc: marking allocation for GC: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.gc: marking allocation for GC: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.361Z [DEBUG] client.gc: alloc garbage collected: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.gc: alloc garbage collected: alloc_id=15e0e45a-9c20-3f67-e634-a37df202642b
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.367Z [DEBUG] client: allocation updates applied: added=1 removed=0 updated=31 ignored=25 errors=0
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: allocation updates applied: added=1 removed=0 updated=31 ignored=25 errors=0
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.373Z [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.414Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad args=[/var/usrlocal/bin/nomad, logmon]
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad args=[/var/usrlocal/bin/nomad, logmon]
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.418Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad pid=4451
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad pid=4451
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.419Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.433Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis version=2
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis version=2
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.434Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon address=/tmp/plugin555195511 network=unix timestamp=2021-02-16T10:15:26.433Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon address=/tmp/plugin555195511 network=unix timestamp=2021-02-16T10:15:26.433Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.437Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon path=/mnt/data/nomad/client/alloc/48f0f4e8-b0fe-0788-d7a5-3c16de4517d9/alloc/logs/.redis.stdout.fifo timestamp=2021-02-16T10:15:26.436Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon path=/mnt/data/nomad/client/alloc/48f0f4e8-b0fe-0788-d7a5-3c16de4517d9/alloc/logs/.redis.stdout.fifo timestamp=2021-02-16T10:15:26.436Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.438Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon path=/mnt/data/nomad/client/alloc/48f0f4e8-b0fe-0788-d7a5-3c16de4517d9/alloc/logs/.redis.stderr.fifo timestamp=2021-02-16T10:15:26.438Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon path=/mnt/data/nomad/client/alloc/48f0f4e8-b0fe-0788-d7a5-3c16de4517d9/alloc/logs/.redis.stderr.fifo timestamp=2021-02-16T10:15:26.438Z
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.469Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis error="rpc error: code = Unknown desc = failed to start task, unable to check for local image: unknown error, status code: 500"
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner: running driver failed: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis error="rpc error: code = Unknown desc = failed to start task, unable to check for local image: unknown error, status code: 500"
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.470Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis reason="Error was unrecoverable"
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.alloc_runner.task_runner: not restarting task: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis reason="Error was unrecoverable"
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.475Z [INFO]  client.gc: marking allocation for GC: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9
Feb 16 10:15:26 n1.local.vlan nomad[3486]:  client.gc: marking allocation for GC: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.636Z [DEBUG] client: updated allocations: index=819 total=57 pulled=31 filtered=26
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: updated allocations: index=819 total=57 pulled=31 filtered=26
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.639Z [DEBUG] client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:26 n1.local.vlan nomad[3486]:     2021-02-16T10:15:26.765Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:26 n1.local.vlan nomad[3486]: client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.838Z [DEBUG] client: updated allocations: index=821 total=57 pulled=31 filtered=26
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: updated allocations: index=821 total=57 pulled=31 filtered=26
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.839Z [DEBUG] client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.850Z [DEBUG] client: updated allocations: index=823 total=57 pulled=32 filtered=25
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: updated allocations: index=823 total=57 pulled=32 filtered=25
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.892Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.892Z [DEBUG] client: allocation updates: added=0 removed=0 updated=32 ignored=25
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: allocation updates: added=0 removed=0 updated=32 ignored=25
Feb 16 10:15:27 n1.local.vlan nomad[3486]:     2021-02-16T10:15:27.943Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=32 ignored=25 errors=0
Feb 16 10:15:27 n1.local.vlan nomad[3486]: client: allocation updates applied: added=0 removed=0 updated=32 ignored=25 errors=0
Feb 16 10:15:28 n1.local.vlan nomad[3486]:     2021-02-16T10:15:28.037Z [DEBUG] client: updated allocations: index=825 total=57 pulled=31 filtered=26
Feb 16 10:15:28 n1.local.vlan nomad[3486]: client: updated allocations: index=825 total=57 pulled=31 filtered=26
Feb 16 10:15:28 n1.local.vlan nomad[3486]:     2021-02-16T10:15:28.038Z [DEBUG] client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:28 n1.local.vlan nomad[3486]: client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:28 n1.local.vlan nomad[3486]:     2021-02-16T10:15:28.083Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:28 n1.local.vlan nomad[3486]: client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.479Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon timestamp=2021-02-16T10:15:30.478Z
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.479Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon timestamp=2021-02-16T10:15:30.478Z
Feb 16 10:15:30 n1.local.vlan nomad[3486]:  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon timestamp=2021-02-16T10:15:30.478Z
Feb 16 10:15:30 n1.local.vlan nomad[3486]:  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis @module=logmon timestamp=2021-02-16T10:15:30.478Z
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.482Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad pid=4451
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.482Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.483Z [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis path=/var/usrlocal/bin/nomad pid=4451
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client.alloc_runner.task_runner: task run loop exiting: alloc_id=48f0f4e8-b0fe-0788-d7a5-3c16de4517d9 task=redis
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.709Z [DEBUG] client: updated allocations: index=826 total=57 pulled=31 filtered=26
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client: updated allocations: index=826 total=57 pulled=31 filtered=26
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.710Z [DEBUG] client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client: allocation updates: added=0 removed=0 updated=31 ignored=26
Feb 16 10:15:30 n1.local.vlan nomad[3486]:     2021-02-16T10:15:30.782Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0
Feb 16 10:15:30 n1.local.vlan nomad[3486]: client: allocation updates applied: added=0 removed=0 updated=31 ignored=26 errors=0

@towe75
Copy link
Collaborator

towe75 commented Feb 16, 2021

IC

The problem is related to a API handling bug in podman 2.x which should be fixed in podman 3.x via containers/podman@3fa61f0

But we're not ready to run the plugin against podman 3.x because of other regressions, see #89

I added a workaround in our non-hashicorp repo via pascomnet@8fd1872 and did not PR it back to hashicorp because of the upcoming podman 3.x release.

A somewhat related PR is #88 and i will enhance it with the API workaround, soon.

Thank you for the report.

@ecky-l
Copy link
Author

ecky-l commented Feb 16, 2021

Thanks for the info.

For me this means I would need to wait for podman 3.x to arrive in coreos. And hope that the API has a little more stabilized until then. I'll use docker in the mean time...

@towe75
Copy link
Collaborator

towe75 commented Feb 16, 2021

@ecky-l : IMHO you can "mitigate the one bug by exploiting the other" if you want.
To do so, you have to simply prefix your image name with "docker://". Downside is that the plugin will always pull the image and thus job startup can be a little bit slower.

@towe75
Copy link
Collaborator

towe75 commented Mar 10, 2021

@ecky-l your problem should be fixed. Can you check with recent podman version and a build from main branch?

@towe75 towe75 closed this as completed Mar 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants