Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman run --cidfile creates cid file containing cid in the long form but podman rm --cidfile expects the short form #11356

Closed
PavelSosin-320 opened this issue Aug 30, 2021 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@PavelSosin-320
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Generated by Podman systemd unit fails because Podman stop and Podman rm commands generated for
ExecStart=, ExecStop=, ExecStopPost= SystemdUnit.service values of the Service file produce and expect different CID formats

Steps to reproduce the issue:

  1. Create container
  2. Generate Systemd Unit file
  3. Enable the Systemd unit
  4. Reload systemd daemon
  5. Run the unit using systemctl start ...
  6. Check the unit's status

Describe the results you received:
The unit fails because although the container is running,cid file exists, the podman stop and podman rm fail to run using the long format

Describe the results you expected:
Podman container is manageable using generated Systemd unit

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Podman 3.2.2

Output of podman info --debug:

(paste your output here)

Package info (e.g. output of rpm -q podman or apt list podman):

(paste your output here)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Plain Fedora 34 Workstation.
P.S. The generated timeout values have to take into account the time of all Pre and Post actions like
google-chrome --app http://localhost:3000 --new-window ... takes time even in the kiosk mode.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2021
@vrothberg
Copy link
Member

This looks like a red herring (i.e., we're looking at the wrong thing). What makes you conclude that rm --cidfile would need the short form? Ultimately, it looks like something else is wrong.

Can try a simple reproducer?
rm -f ~/foo.txt; podman run --cidfile ~/foo.txt fedora ls; podman rm --cidfile ~/foo.txt

@PavelSosin-320
Copy link
Author

The result is:
podman rm --cidfile ~/foo.txt
Error: no container with name or ID "505284f2a260554389ab973efb56d070057347e144d321f48c74b90414db657a" found: no such container.
The same message as when using Systemd Unit

@vrothberg
Copy link
Member

@PavelSosin-320, did you execute the entire command as pasted?

rm -f ~/foo.txt; podman run --cidfile ~/foo.txt fedora ls; podman rm --cidfile ~/foo.txt

@PavelSosin-320
Copy link
Author

@vrothberg The above command sequence works for me on Podman-Fedora34-WSL Podman 3.2.3 but the following doesn't work on the native Podman 3.3 - Fedora34:

Pavel Sosin [email protected] Pavel Sosin [email protected] 10:30 AM (4 minutes ago)    
Pavel Sosin [email protected]
to me to me
to me

[pavelsosin@fedora user]$ ls $XDG_RUNTIME_DIR/
bus container-theiaUnit-cid dconf gnome-shell gvfsd keyring netns pipewire-0.lock pulse wayland-0
containers crun gnome-session-leader-fifo gvfs ICEauthority libpod pipewire-0 podman systemd wayland-0.lock
[pavelsosin@fedora user]$ cat $XDG_RUNTIME_DIR/container-theiaUnit-cid
b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7[pavelsosin@fedora user]$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a81d5891a2c6 docker.io/theiaide/theia:latest 46 hours ago Up 46 hours ago 0.0.0.0:3000->3000/tcp theiaide
[pavelsosin@fedora user]$ podman stop b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7
Error: no container with name or ID "b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7" found: no such container
[pavelsosin@fedora user]$ podman stop --cidfile $XDG_RUNTIME_DIR/container-theiaUnit-cid
Error: no container with name or ID "b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7" found: no such container
[pavelsosin@fedora user]$

Regression?

@PavelSosin-320
Copy link
Author

@vrothberg Related: maybe, ConditionPathExists=!%t/container-theiaUnit.ctr-id is more reliable than rm files for keeping the running container singleton. It writes to the log only:
_Condition check resulted in Podman container-theiaUnit.service being skipped.

@vrothberg
Copy link
Member

[pavelsosin@fedora user]$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a81d5891a2c6 docker.io/theiaide/theia:latest 46 hours ago Up 46 hours ago 0.0.0.0:3000->3000/tcp theiaide
[pavelsosin@fedora user]$ podman stop b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7
Error: no container with name or ID "b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7" found: no such container
[pavelsosin@fedora user]$ podman stop --cidfile $XDG_RUNTIME_DIR/container-theiaUnit-cid
Error: no container with name or ID "b7f9fc6eaa5dacf9d1a5599e9985d1ce0829a4bc820676a6fe1d04569651bfb7" found: no such container

This looks good to me. podman ps only shows container a81d5891a2c6, so there is none with ID b7f9fc6eaa5da...

Please share a minimal reproducer . IMHO rm -f ~/foo.txt; podman run --cidfile ~/foo.txt fedora ls; podman rm --cidfile ~/foo.txt is a minimal reproducer and that seems to work. Something else must be going on, but without a reproducer I do not know.

@PavelSosin-320
Copy link
Author

The differences between minimal reproducer and my scenario are

  1. Different Podman version 3.2.3 -> 3.3
    container-TheiaUnit-service.txt
    version
  2. Theia IDE is a long-starting, long-running, and long-stopping service with a stateful backend.
  3. I use it as a rootless WSL user.
  4. Podman service is started by a separate unit - see dependencies.
  5. Post-exec open Chrome may have side effects. On my Linux machine, I use google-chrome.

@vrothberg
Copy link
Member

Sorry, I do not understand what's going on. The short-form vs. long-form thing does not hold to be true. If the --cidfile points to a container that doesn't exist anymore, something's wrong.

Doesn't the systemd unit handle the case already when the container has been removed? See **--ignore **.

@PavelSosin-320
Copy link
Author

PavelSosin-320 commented Aug 31, 2021

Ignore potentially changes the state without the notice. I don't know if the container stopped and removed Systemd manages the Unit's lifecycle but not the container's lifecycle. Theia can eventually fail to load vscode plugins and it happens. A simple restart of the Theia container recovers Theia. But I can't run Theia again because TCP port, name, and workspace volumes are in use. Podman run and Podman restart is not the same things. Early or later I will need systemctl restart and Restart-on-failure with tiny restart drop-in containing only podman restart -l line.

@PavelSosin-320
Copy link
Author

Further testa pointed that the root cause of the systemd unit instability is a simple bug in the Podman run command: when I use the --replace option I expect that both pid and conmon pid files are replaced unconditionally. Podman container rm must cleanup pid files too without any additional actions because these files will contain incorrect values. So rm files command as PreExec action is redundant. The cost of rm action is significant because it creates a separate process that has the potential to become zombie. Since systemd sees conmon as a main process PID and it always can be achieved using systemctl [--user] show --property =MainPid --conmon-pidfile creates an additional file to memorize the property that is already memorized by the Sytemd manager.

@vrothberg
Copy link
Member

when I use the --replace option I expect that both pid and conmon pid files are replaced unconditionally

That is not how --replace works (nor how it's documented to work). I like the idea but I don't see it as a bug. --replace will only replace the container but not any cid or pid files.

Podman container rm must cleanup pid files too without any additional actions because these files will contain incorrect values.

That would be a breaking change. Some users may very well depend on these files to not be removed. That's why the generated unit files do it during ExecStartPre.

The cost of rm action is significant because it creates a separate process that has the potential to become zombie.

I doubt that a /bin/rm -f can become a zombie. Can you elaborate on that? It is certainly not entailing a significant cost; just compare it to the loads of work needed to create a container.

Since systemd sees conmon as a main process PID and it always can be achieved using systemctl [--user] show --property =MainPid --conmon-pidfile creates an additional file to memorize the property that is already memorized by the Sytemd manager.

Systemd only sees conmon as the main PID because Podman is writing the PID file. Otherwise, systemd may chose another process in cgroup as the main (e.g., podman or fuse-overlayfs).

Again. Please share a reproducer that supports your theories. The generated units are well tested and unless they have been changed manually, they should work. Given that WSL 2 doesn't support systemd yet, I cannot exclude the chance of the custom systemd on your machine having issues. But again, without a clear reproducer, I am unable to resolve the issue and I don't want to speculate.

@PavelSosin-320
Copy link
Author

@vrothberg Hi, This is exactly the issue:

I'm testing on the plain Fedora34 WS with Podman 3.3 in the regular GNome session's terminal. But nothing works as expected:
pid and ctr-id files are not removed because they are looked locked.
Podman run fails because pid and ctr-id files already exist
podman rm --ignore fails due to any reason except container is missed
The same container can't be run twice because port 3000 is already taken. In the run command --replace is missed.
All adjustments I did in the Theia unit that starts Theia IDE container with Podman backend on the WSL side

@vrothberg
Copy link
Member

@PavelSosin-320, can you share the exact commands to reproduce the issue? Step by step.

@PavelSosin-320
Copy link
Author

@vrothberg Sorry! This is actually the dup #4678 but with huge complications: My local WiFi network is actually pure IPV6 because my ISP advertises this configuration via my WiFi router The only active link from my LinuxWS to the outside world is wlo1 with IPV6 address. My LAN has IPV6 address including DNS forwarder inside my OpenWRT router IPV6 enabled.
WSL works differently because it is VM and its eth0 is IPV4. In both cases /etc/resolv.conf is not usable and to gain stable name resolution I use resolved.conf with public resolvers configured using IPV6 addresses. When networking target is reached /etc/resolv.conf exists but is meaningless - I don't know how to define global resolvers by their IPV6 addresses in the resolv.conf.
But it is totally different topic :(.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants