-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman machine on macOS becomes unresponsive after some time #20639
Comments
@benoitf thanks for the write up and the fact that you have a solution is a good thing. I doubt there is much podman can do about this except to use this an anchor issue. I am curious if you are able to observe this same behavior with vfkit directory or if you could try to reproduce outside of podman? |
ah! reading closer, it clearly says this is qemu ... so vfkit and friends are off the hook! |
I found this issue as well coreos/fedora-coreos-tracker#1463 but it should have been already solved :-/ |
There are known rare network connectivity issues observed with vfkit, hyperv, and this one. Hard to say if this is the same issue as it's not easily reproducible. The common component between all of these is gvisor-tap-vsock, which happens to be the component which would be reading the data transmitted by virtio-net (the logs complain about virtio_net TX timeout). |
@baude my computer is probably suspending/hibernating/resume/etc during this amount of time. I would not assume the machine was always on/no suspend/etc |
This is one of the open questions for which I need to look closer at gvproxy's code. virtio-net TX/RX queues are bound somehow to the unix socket RX/TX buffers on the host. Unloading/reloading virtio-net could reset enough state to get network traffic to flow again between the guest and the host (blind guess, I really need to look closer at all this code ^^ ) Or maybe this has nothing to do with gvisor-tap-vsock and it is a kernel bug similar to coreos/fedora-coreos-tracker#1463 (which Florent mentioned) |
@cfergeau appreciate your help here! tyvm. |
When running podman with For the record, I'm running macOS Sonoma 14.1.1 (23B81) @ Apple M1 Pro. podman version 4.7.2 |
This is consistent with what Florent pointed out in the issue description, |
closing as I'm not reproducing it with latest FCOS version/latest gvproxy version |
I'm testing FCOS stable latest with my team! Also, I would like to understand what new change fixes this issue. I'm looking at the 39.20231101.3.0's release notes Edit: |
hello @CA-Demetriade I'm using Fedora CoreOS 39.20231119.2.0 but before I was using your version I'm also using the latest version of gvproxy binary (depending on the installation method you pickup maybe it's an old version) |
Thank you @benoitf for your answer! We are using Podman 4.7.2 (downloaded from homebrew). Concerning FCOS, I was using the latest version of |
This issue was a networking issue, causing for example |
@baude : Does this remind you another issue: #20639 (comment) ? |
@CA-Demetriade i would suggest to run podman machine connect with if it freezes you could do some inspection from the QEMU terminal (if networking is not working) |
To check if the issue you are seeing is similar to this one, once you have a qemu console and the issue occurs, you can
|
@cfergeau I confirm that |
reopening as I'm hitting the issue again with everything being up-to-date |
I'm using Symantec Endpoint Protection on my Mac, so maybe it's related to this issue on the RedHat customer portal I don't have a RedHat subscriber account, so I can't read about the issue. |
Surely I will report if problem is gone. But be aware that podman and podman desktop were also upgraded in the process |
I uninstalled everything from brew (podman-desktop and podman-cli). I reinstalled 1.6.3 from official installer.
Would you like me to collect any evidence before I stop and start my VM? |
Can you try to identify the path of the qemu process that is running ? |
Seems OK to me:
|
If you want to try if the new
Note: someone on my team has been able to repro the issue even with the qemu binary we specifically made; it was hard to get it doing this, but seems it is a Qemu+kernel related issue. If you can test with the vfkit/applehv setup, that would be much appreciated. for CRC we haven't seen issues for over a year with that virtualization stack. Note 2: since it is still an early release, it seems the |
Let's go !
|
Results this morning are promising! VM is perfectly fine and operational. Like disk or network performance? |
The disk performance might be slightly improved, as is virtio shares. network should be similar. I have seen a report that sharing home failed with permission denied, though think this came from SELinux instead. |
Yes there is an issue with relabeling read/only files from the MAC, but everything else works. |
Feedback of the day regarding the usage of First, I have an issue. Podman desktop UI does not see my VM at all. I am unsure it did see it yesterday. But this morning, Podamn Desktop app acts like if I had never created any podman-machine.
Second, I just realized that UserModeNetworking is reported as false by
And then finally, I don't seem able to throw ssh commands:
|
Podman Desktop does not see your machine because it relies on CONTAINERS_MACHINE_PROVIDER to be set so if you want Podman Desktop to have it, you should set it in your login shell, don't know exactly how to do on MacOS. |
machine provider should be set in containers.conf file to be seen globally and not having to export env variable before running each podman command https://github.com/containers/common/blob/main/docs/containers.conf.5.md line |
OK let's go
Indeed convenient not to have to set the env var. |
report may not be accurate (for user network mode) |
Please open separate issues for what you have found. We need to get to the point where we can make applehv the default. |
thanks @fabricepipart1a |
I opened #21092 to cover the main issue I have with applehv. |
We have been in touch with some people, like Sergio Lopez, to discuss if this was related to the virtualization stack and how networking is set up, besides the use of our gvproxy/user-mode networking. It has been confirmed that this is the case, while we have a few patches to mitigate this for Qemu, this has to be resolved for qemu+virtio. We will be doing this outside of the scope of this issue. Instead, we have a few PRs lined up that modify the transmission buffer, and 'bounces' the network. |
FYI: cherrypicked for 4.9 |
I confirm that the patch is allowing to workaround the issue 👍 |
I guess this can be closed now? |
closing as with the workaround/bouncing of the interface it's staying alive now thanks to all the people involved |
Issue Description
My podman machine is not responding after some time.
CLI is not responding but
podman machine ls
says it's runningpodman version 4.7.2
Steps to reproduce the issue
Steps to reproduce the issue
Describe the results you received
Connection refused or connection hanging
podman ps
is blockingpodman machine ssh
as wellDescribe the results you expected
it should work
podman info output
macOS Sonoma
inspect of the machine:
podman machine inspect
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
I started podman with DEBUG output so I have a qemu window
we can see
virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0
messagesI need to do
ifconfig enp0s1 down
and thenifconfig enp0s1 up
and then network is restored in the machinepodman info output when it's working back:
The text was updated successfully, but these errors were encountered: