Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete WSL implementation in Podman 5 #21597

Merged
merged 5 commits into from
Feb 12, 2024

Conversation

n1hility
Copy link
Member

Complete WSL implementation, refactor a few areas

Also addresses a number of issues:
- StopHostNetworking isn't plumbed, win-sshproxy leaks on hyperv
- Wait api and print output doesn't work properly on Windows
- API forwarding doesn't work on WSL
- Terminal corruption with after start/stop on Windows
- Gvproxy is forcefully killed vs gracefully quit
- Switching rootful/rootless does not update /var/run/docker.sock on the guest
- File already closed error on init
- HyperV backend is publishing Unix sockets when it should be named pipes
- User-mode networking doesn't always work
- Stop state outside of lock boundaries
- WSL blocks parallel machined (should be supported)

none

Copy link
Contributor

openshift-ci bot commented Feb 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: n1hility

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2024
@n1hility
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2024
@n1hility n1hility changed the title Comple WSL implementation in Podman 5 Complete WSL implementation in Podman 5 Feb 10, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@n1hility n1hility force-pushed the wsl-refactor branch 5 times, most recently from 64da6b4 to ca7c087 Compare February 10, 2024 21:06
@baude
Copy link
Member

baude commented Feb 11, 2024

the stop failure is legit problem ...

@baude
Copy link
Member

baude commented Feb 11, 2024

@n1hility this patch resolves the test problem. My only pause is I am not dead sure if I am treating a symptom instead of fixing the real problem. But on that front, I do not see something like a missed .Close() ... maybe take a quick peek before applying my patch. The patch also explains whats going on where the cause/effect? is ... maybe someone else will see it too.

diff --git a/pkg/machine/qemu/machine.go b/pkg/machine/qemu/machine.go
index 1bcb46ce46..65040ed457 100644
--- a/pkg/machine/qemu/machine.go
+++ b/pkg/machine/qemu/machine.go
@@ -6,6 +6,7 @@ import (
        "encoding/json"
        "errors"
        "fmt"
+       "io"
        "io/fs"
        "os"
        "os/exec"
@@ -266,6 +267,21 @@ func (q *QEMUStubber) State(mc *vmconfigs.MachineConfig, bypass bool) (define.St
                return "", err
        }
        if err := monitor.Connect(); err != nil {
+               // There is a case where if we stop the same vm (from running) two
+               // consecutive times we can get an econnreset when trying to get the
+               // state
+               if errors.Is(err, syscall.ECONNRESET) {
+                       // try again
+                       logrus.Debug("received ECCONNRESET from QEMU monitor; trying again")
+                       secondTry := monitor.Connect()
+                       if errors.Is(secondTry, io.EOF) {
+                               return define.Stopped, nil
+                       }
+                       if secondTry != nil {
+                               logrus.Debugf("second attempt to connect to QEMU monitor failed")
+                               return "", secondTry
+                       }
+               }
                return "", err
        }
        defer func() {

baude and others added 2 commits February 11, 2024 12:58
Signed-off-by: Brent Baude <[email protected]>
Also addresses a number of issues:
- StopHostNetworking isn't plumbed, win-sshproxy leaks on hyperv
- Wait api and print output doesn't work properly on Windows
- API forwarding doesn't work on WSL
- Terminal corruption with after start/stop on Windows
- Gvproxy is forcefully killed vs gracefully quit
- Switching rootful/rootless does not update /var/run/docker.sock on the guest
- File already closed error on init
- HyperV backend is publishing Unix sockets when it should be named pipes
- User-mode networking doesn't always work
- Stop state outside of lock boundaries
- WSL blocks parallel machined (should be supported)

[NO NEW TESTS NEEDED]

Signed-off-by: Jason T. Greene <[email protected]>
@n1hility
Copy link
Member Author

@baude thanks! I applied your patch and rebased again. i mentioned in chat but just in case you miss it that i noticed some underlying raciness in qemu start/stop that we can fixup later (e.g. the logic has the netdev sock setup overwriting the qmp monitor sock which confuse client connection code)

@n1hility n1hility force-pushed the wsl-refactor branch 2 times, most recently from cffb9d0 to 10cf59b Compare February 11, 2024 19:57
n1hility and others added 2 commits February 11, 2024 17:04
Improve error reporting on ssh readiness check

Signed-off-by: Jason T. Greene <[email protected]>
@baude
Copy link
Member

baude commented Feb 12, 2024

LGTM

pkg/machine/gvproxy_windows.go Show resolved Hide resolved
pkg/machine/gvproxy_windows.go Outdated Show resolved Hide resolved
@arixmkii
Copy link
Contributor

@n1hility regarding this comment #21597 (comment) Why don't just eliminate the root cause and don't reuse QMP for VLAN in QEMU. It is what I proposed in #21594 with a bigger change. But this could be also achieved w/o removing this FD. Just use dedicated socket for the purpose.

@ashley-cui
Copy link
Member

LGTM other than the prev reviews.

Signed-off-by: Jason T. Greene <[email protected]>
@n1hility
Copy link
Member Author

@n1hility regarding this comment #21597 (comment) Why don't just eliminate the root cause and don't reuse QMP for VLAN in QEMU. It is what I proposed in #21594 with a bigger change. But this could be also achieved w/o removing this FD. Just use dedicated socket for the purpose.

I totally agree we need to fix this, and soon, but I also don't want this PR to grow beyond what it is (already pretty huge) and its holding up WSL working in 5. That and other stuff can come in other commits.

@n1hility
Copy link
Member Author

I added the small comment review charge in a separate commit to address the review, yet still show the record of CI passing on the earlier commits.

@baude
Copy link
Member

baude commented Feb 12, 2024

i think there is consensus to get this PR in after @n1hility addressed the first round of comments ...

/hold
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2024
@baude baude mentioned this pull request Feb 12, 2024
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but for the size it is hard to properly review so it is hard to catch things.
Looks like windows CI is still disabled, does that mean we can turn it on after this merges?

@baude
Copy link
Member

baude commented Feb 12, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit c524da2 into containers:main Feb 12, 2024
70 of 84 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label May 13, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators May 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants