-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: podman stop results in panic: runtime error: invalid memory address or nil pointer dereference #17069
Comments
when I try the same command I get:
I am using:
|
Just looking at the stack trace I am surprised a |
Not sure about that, maybe you have a config file somewhere overwriting arguments being used? I spinned up a fresh VM, simply installed the packages and ran the commands and was able to reproduce.
If that helps I just spinned up a new fresh VM from scratch again and this is the litteral bash history from the moment the OS is installed to reproduction of the issue :
I can provide this VM image if you want
Not sure what you mean by restart, from what I am seeing in the logs its only trying to stop with a delay of 10 seconds but I'm not all that sure what I'm looking at. In a second step I am manually bringing back up the containers and the network seems in a bad state making it fail. |
Speaking of the fresh VM i mention in my latest reply theres another variant of this issue i forgot to mention
no clue what causes one issue or the other (the one in the first post with the panic) to happen or why there seems to be two different possible outcomes updated my original post to include the other error message |
@Luap99 Yeah, this doesn't look like a restart policy restart. An explicit stop by the user (either |
No I read this as stop was called but it decided it has to restart the container due it restart policy
The flow is |
Damn, you're right. That's definitely not correct. Stop explicitly sets a bool in the database to prevent restart action from taking place. |
Aha, got it, we're calling cleanup before we set StoppedByUser. |
Okay still on the VM, after a fresh reboot again, If I start the services with podman-compose --podman-run-args="--userns=keep-id" up and force a graceful shutdown with a single ^CTraceback (most recent call last):
File "/usr/bin/podman-compose", line 8, in <module>
sys.exit(main())
File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1775, in main
podman_compose.run()
File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1024, in run
cmd(self, args)
File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1248, in wrapped
return func(*args, **kw)
File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1442, in compose_up
thread.join(timeout=1.0)
File "/usr/lib/python3.10/threading.py", line 1100, in join
self._wait_for_tstate_lock(timeout=max(timeout, 0))
File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
KeyboardInterrupt
2023-01-11T15:06:17Z [SERVER] INFO: Shutdown requested
2023-01-11T15:06:17Z [SERVER] INFO: Called signal: SIGINT
2023-01-11T15:06:17Z [SERVER] INFO: Stopping all monitors
[user@demovm juke]$ 2023/01/11 16:06:17 ...eful/manager_unix.go:147:handleSignals() [W] [63bed03e-4] PID 2. Received SIGINT. Shutting down...
2023/01/11 16:06:17 cmd/web.go:271:listen() [I] [63bed03e-6] HTTP Listener: 0.0.0.0:1024 Closed
2023/01/11 16:06:17 ...eful/server_hooks.go:47:doShutdown() [I] [63bed03e-6] PID: 2 Listener ([::]:1024) closed.
2023/01/11 16:06:17 .../graceful/manager.go:206:doHammerTime() [W] Setting Hammer condition
2023-01-11 16:06:17.434 CET [1] LOG: received fast shutdown request
2023-01-11 16:06:17.442 CET [1] LOG: aborting any active transactions
2023-01-11 16:06:17.444 CET [1] LOG: background worker "logical replication launcher" (PID 14) exited with exit code 1
2023-01-11 16:06:17.444 CET [9] LOG: shutting down
[Wed Jan 11 16:06:17.449373 2023] [mpm_prefork:notice] [pid 1] AH00169: caught SIGTERM, shutting down
2023-01-11 16:06:17.450 CET [9] LOG: checkpoint starting: shutdown immediate
2023-01-11 16:06:17.513 CET [9] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.015 s, sync=0.006 s, total=0.070 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB
2023-01-11 16:06:17.516 CET [1] LOG: database system is shut down
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:1024: use of closed network connection" entryPointName=web
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:1025: use of closed network connection" entryPointName=ssh
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:1025: use of closed network connection" entryPointName=ssh
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:1024: use of closed network connection" entryPointName=web
2023-01-11 16:06:17.734 CET [1] LOG: received fast shutdown request
2023-01-11 16:06:17.745 CET [1] LOG: aborting any active transactions
2023-01-11 16:06:17.746 CET [1] LOG: background worker "logical replication launcher" (PID 14) exited with exit code 1
2023-01-11 16:06:17.746 CET [9] LOG: shutting down
2023-01-11 16:06:17.755 CET [9] LOG: checkpoint starting: shutdown immediate
2023-01-11 16:06:17.830 CET [9] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.018 s, sync=0.010 s, total=0.084 s; sync files=2, longest=0.006 s, average=0.005 s; distance=0 kB, estimate=0 kB
2023-01-11 16:06:17.833 CET [1] LOG: database system is shut down
2023/01/11 16:06:18 .../graceful/manager.go:225:doTerminate() [W] Terminating
2023/01/11 16:06:18 ...eful/manager_unix.go:158:handleSignals() [W] PID: 2. Background context for manager closed - context canceled - Shutting down...
2023/01/11 16:06:18 cmd/web.go:140:runWeb() [I] PID: 2 Gitea Web Finished
2023-01-11T15:06:19Z [DB] INFO: Closing the database
2023-01-11T15:06:21Z [DB] INFO: SQLite closed
2023-01-11T15:06:21Z [CLOUDFLARED] INFO: Stop cloudflared
2023-01-11T15:06:21Z [SERVER] INFO: Graceful shutdown successful! However when i check with lsof [user@demovm juke]$ sudo lsof -i -P -n | grep 1024
rootlessp 3127 user 10u IPv6 27132 0t0 TCP *:1024 (LISTEN) The rootlessport process seems to still be there |
#17077 should fix the incorrect firing of restart policy. Not the panic, though, but it shouldn't happen if restart policy isn't restarting the container. |
Thank you very much for your reactivity on this! edit: it does in fact seem like removing the restart policy from the compose from all the services does seem to make any errors go away and the port frees up whever i try in daemon or foreground with graceful shutdown, nice catch! |
Okay so after trying some more things, Running all of the commands that podman-compose runs, but manually, with a restart policy, seems to always work (as in no error) Here are the commands i used spoilerpodman run --userns=keep-id --name=juke_traefik_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=traefik -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /run/user/1000/podman/podman.sock:/var/run/docker.sock:ro --net juke_default --network-alias traefik -p 1024:1024 -p 1025:1025 -p 1026:8080 -u 1000:1001 --restart always docker.io/library/traefik --api.insecure=true --providers.docker=true --providers.docker.exposedbydefault=false --entrypoints.web.address=:1024 --entrypoints.ssh.address=:1025 &\
podman run --userns=keep-id --name=juke_nextcloud_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/nextcloud_database:/var/lib/postgresql/data:Z --net juke_default --network-alias nextcloud_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\
podman run --userns=keep-id --name=juke_gitea_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_database:/var/lib/postgresql/data:Z --net juke_default --network-alias gitea_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\
podman run --userns=keep-id --name=juke_nextcloud_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.nextcloud_server.rule="Host(\`cloud.localhost\`)" --label traefik.http.routers.nextcloud_server.entrypoints=web --label traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_server -e POSTGRES_PASSWORD=password -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_HOST=nextcloud_database -e NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/nextcloud_server_passwd:/etc/passwd:ro -v /home/user/juke/resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro -v /home/user/juke/volumes/nextcloud_server:/var/www/html:Z --net juke_default --network-alias nextcloud_server -u 1000:1001 --restart always --hostname cloud.localhost docker.io/library/nextcloud &\
podman run --userns=keep-id --name=juke_gitea_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.gitea_server.rule="Host(\`code.localhost\`)" --label traefik.http.routers.gitea_server.entrypoints=web --label traefik.http.services.gitea_server-juke.loadbalancer.server.port=1024 --label traefik.tcp.routers.gitea_server_ssh.rule="HostSNI(\`*\`)" --label traefik.tcp.routers.gitea_server_ssh.entrypoints=ssh --label traefik.tcp.services.girea_server_ssh-juke.loadbalancer.server.port=1025 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_server -e HTTP_PORT=1024 -e DEFAULT_BRANCH=main -e RUN_MODE=prod -e DISABLE_SSH=false -e START_SSH_SERVER=true -e SSH_PORT=1025 -e SSH_LISTEN_PORT=1025 -e ROOT_URL=http://code.localhost -e GITEA__database__DB_TYPE=postgres -e GITEA__database__HOST=gitea_database:5432 -e GITEA__database__NAME=database -e GITEA__database__USER=user -e GITEA__database__PASSWD=password -e GITEA__service__DISABLE_REGISTRATION=true -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/gitea_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_server:/data:Z --net juke_default --network-alias gitea_server -u 1000:1001 --restart always docker.io/gitea/gitea:latest-rootless &\
podman run --userns=keep-id --name=juke_uptime_kuma_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.uptime_kuma_server.rule="Host(\`status.localhost\`)" --label traefik.http.routers.uptime_kuma_server.entrypoints=web --label traefik.http.services.uptime_kuma_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=uptime_kuma_server -e PUID=1000 -e PGID=1001 -e PORT=1024 -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/uptime_kuma_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/uptime_kuma_server:/app/data:Z --net juke_default --network-alias uptime_kuma_server -u 1000:1001 --restart always --entrypoint '["node", "/app/server/server.js"]' docker.io/louislam/uptime-kuma &
podman stop -t 10 juke_uptime_kuma_server_1
podman stop -t 10 juke_gitea_server_1
podman stop -t 10 juke_nextcloud_server_1
podman stop -t 10 juke_gitea_database_1
podman stop -t 10 juke_nextcloud_database_1
podman stop -t 10 juke_traefik_1
podman rm juke_uptime_kuma_server_1
podman rm juke_gitea_server_1
podman rm juke_nextcloud_server_1
podman rm juke_gitea_database_1
podman rm juke_nextcloud_database_1
podman rm juke_traefik_1 However as soon as I do a podman-compose --podman-run-args="--userns=keep-id" up
CTRL-C podman-compose --podman-run-args="--userns=keep-id" down -v
['podman', '--version', '']
using podman version: 4.3.1
** excluding: set()
podman stop -t 10 juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman stop -t 10 juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman stop -t 10 juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman stop -t 10 juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman stop -t 10 juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman stop -t 10 juke_traefik_1
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x55c1faa0695b]
goroutine 13 [running]:
os.(*File).Name(...)
os/file.go:56
github.com/containers/podman/v4/pkg/errorhandling.CloseQuiet(0xc00028a420?)
github.com/containers/podman/v4/pkg/errorhandling/errorhandling.go:74 +0x5b
github.com/containers/podman/v4/libpod.(*Runtime).setupRootlessPortMappingViaRLK(0xc00026a540, 0xc000441200, {0xc00021df00, 0x3f}, 0xc0003e1601?)
github.com/containers/podman/v4/libpod/networking_slirp4netns.go:581 +0x105d
github.com/containers/podman/v4/libpod.(*Container).setupRootlessNetwork(0xc000441200)
github.com/containers/podman/v4/libpod/container_internal_linux.go:414 +0x13c
github.com/containers/podman/v4/libpod.(*Container).handleRestartPolicy(0xc000441200, {0x55c1fba89510, 0xc000130020})
github.com/containers/podman/v4/libpod/container_internal.go:296 +0x445
github.com/containers/podman/v4/libpod.(*Container).Cleanup(0xc000441200, {0x55c1fba89510, 0xc000130020})
github.com/containers/podman/v4/libpod/container_api.go:726 +0x3dd
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).ContainerStop.func1(0xc000441200)
github.com/containers/podman/v4/pkg/domain/infra/abi/containers.go:248 +0x24e
github.com/containers/podman/v4/pkg/parallel/ctr.ContainerOp.func1()
github.com/containers/podman/v4/pkg/parallel/ctr/ctr.go:28 +0x22
github.com/containers/podman/v4/pkg/parallel.Enqueue.func1()
github.com/containers/podman/v4/pkg/parallel/parallel.go:67 +0x1ac
created by github.com/containers/podman/v4/pkg/parallel.Enqueue
github.com/containers/podman/v4/pkg/parallel/parallel.go:56 +0xbe
exit code: 2 So this might be actually caused by podman-compose or I am very unlucky and it only happens when using the podman-compose commands (the error does not occur 100% of the time in the first place) Should this maybe be moved to the podman-compose repo ? |
That probably explains why I can't reproduce. Might be related to how podman-compose is calling Podman - maybe they're leaking extra file descriptors in, which is messing with rootlessport config? |
The StoppedByUser variable indicates that the container was requested to stop by a user. It's used to prevent restart policy from firing (so that a restart=always container won't restart if the user does a `podman stop`. The problem is we were setting it *very* late in the stop() function. Originally, this was fine, but after the changes to add the new Stopping state, the logic that triggered restart policy was firing before StoppedByUser was even set - so the container would still restart. Setting it earlier shouldn't hurt anything and guarantees that checks will see that the container was stopped manually. Fixes containers#17069 Signed-off-by: Matthew Heon <[email protected]>
Issue Description
Trying to use traefik with podman (more specifically podman-compose) not sure what specifically is the issue or if it is even related to traefik.
Here is what the compose file looks like spoiler
Everything seems to work fine when I
podman-compose --podman-run-args="--userns=keep-id" up -d
However when I
podman-compose --podman-run-args="--userns=keep-id" down -v
I get the following error
Sometimes it will look like this however:
At this point if I lsof I see a process that I can kill
But doing so still apparently leaves the system thinking that IP addresses are allocated when they shouldnt be because trying to spin up the services again with
podman-compose --podman-run-args="--userns=keep-id" up -d
results in the following
Saying that its failing to find any free IPs
Steps to reproduce the issue
Steps to reproduce the issue
press CTRL-C, error happens
alteratively, start the services in daemon mode
up -d
and destroy them and their volumes in another stepdown -v
, same error happensDescribe the results you received
Stopping is not clean and not leaving hung processes and IP addresses stay unavailable. The only way I found to fix it properly is to reboot the entire host.
Describe the results you expected
Not having hung processes that make it impossible to restart the pods because no more IPs are available and needing to reboot to get it to work.
podman info output
podman information output log spoiler
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Happens both locally and inside a fresh VM.
Additional information
Ask if anything unclear
None of the info in here is sensitive and mostly placeholders for password and such not to worry
The text was updated successfully, but these errors were encountered: