Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If initializing a machine fails, it should not be added to the list of machines #15154

Closed
benoitf opened this issue Aug 2, 2022 · 12 comments · Fixed by #15184
Closed

If initializing a machine fails, it should not be added to the list of machines #15154

benoitf opened this issue Aug 2, 2022 · 12 comments · Fixed by #15184
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine

Comments

@benoitf
Copy link
Contributor

benoitf commented Aug 2, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

on Windows, if I use the msi and I don't have all requirements (like hyper-v installed, etc)
Calling podman machine init is giving an error but the machine is inserted as a "podman machine"

C:\Users\User>podman machine init
Extracting compressed file
Importing operating system into WSL (this may take a few minutes on a new WSL install)...
Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS.
For information please visit https://aka.ms/wsl2-install
Error: the WSL import of guest OS failed: exit status 0xffffffff

and then if I call podman machine list I can see a machine

C:\Users\User>podman machine list
NAME                     VM TYPE     CREATED         LAST UP         CPUS        MEMORY      DISK SIZE
podman-machine-default*  wsl         44 seconds ago  44 seconds ago  0           0B          0B

as it's listed as a stopped machine we can try start

C:\Users\User>podman machine start
Starting machine "podman-machine-default"
There is no distribution with the supplied name.
There is no distribution with the supplied name.
Error: the WSL bootstrap script failed: exit status 0xffffffff

and we have errors as well but if machine is not added it might be better.

And if I reboot after enabling virtualization the machine still fails to start so it's corrupted and should not be there.

C:\Users\User>podman machine start
Starting machine "podman-machine-default"
There is no distribution with the supplied name.
There is no distribution with the supplied name.
Error: the WSL bootstrap script failed: exit status 0xffffffff

Creating a new one works

C:\Users\User>podman machine init foo
Extracting compressed file
Importing operating system into WSL (this may take a few minutes on a new WSL install)...
Configuring system...
Creating mailbox file: No such file or directory
Generating public/private ed25519 key pair.
Your identification has been saved in foo
Your public key has been saved in foo.pub
The key fingerprint is:
....

Steps to reproduce the issue:

Windows 11 without nested virtualization enabled.

  1. podman machine init

  2. podman machine list

Describe the results you received:
a machine is listed

Describe the results you expected:
no machine as initialization failed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

4.2.0-dev

Output of podman info --debug:

(paste your output here)

Package info (e.g. output of rpm -q podman or apt list podman):

(paste your output here)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes/No

Additional environment details (AWS, VirtualBox, physical, etc.):

@benoitf benoitf added machine windows issue/bug on Windows labels Aug 2, 2022
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2022
@github-actions github-actions bot removed the windows issue/bug on Windows label Aug 2, 2022
@rhatdan
Copy link
Member

rhatdan commented Aug 2, 2022

@ashley-cui PTAL

@benoitf
Copy link
Contributor Author

benoitf commented Aug 2, 2022

it seems also that the machine is added to the list of machines before the machine is fully ready.

Machine is becoming listed as a podman machine so in another terminal we can do podman machine start while the message Machine init complete To start your machine run: podman machine start is appearing later.

so we can have errors like

/bin/bash: line 1: /root/bootstrap: No such file or directory
the WSL bootstrap script failed: exit status 127

@ashley-cui
Copy link
Member

I think this is because list just goes through the machine directory and unmarshalls machine jsons, if the machine json exists, then podman machine recognizes the machine as "exists". Would the solution here be removing the machine json automatically if the init fails? Something like if the init returns an error, automatically do a rm of the machine?

@rhatdan
Copy link
Member

rhatdan commented Aug 2, 2022

That seems reasonable to me. We don't want to leave around broken configuration.

@benoitf
Copy link
Contributor Author

benoitf commented Aug 2, 2022

@ashley-cui and probably the machine needs to be listed in podman machine list only when the init step completed (and not before)

For example I can launch podman machine init and few seconds later podman machine start in a separate terminal but of course it's failing with random errors as we try to start something that is not yet fully initialized.

@ashley-cui
Copy link
Member

So I guess we solve the issue by both having a "ready" state for a machine, as well as removing the config file immediately on a failed init? I'm not sure if we should check for ready status on a podman machine list though, this seems like it could create issues where there are ghost machines that the user doesn't know exists.

@n1hility
Copy link
Member

n1hility commented Aug 2, 2022

A few of us were discussing this on chat. For the next couple of days my bandwidth is limited and won't be able to work on a patch, but @gbraad might have some cycles to help out if you need that @ashley-cui.

A couple thoughts:

  1. It's always possible that a crash or system halt or some other event leave init incomplete, so auto-cleanup is not guaranteed. Additionally, even if the state file is deferred there is a possible conflicting state with the WSL distribution registration and supporting files in the filesystem. Therefore, we should probably have an "incomplete" state. In most cases, the auto-rm would clean this up so the user wouldn't see it, but in the event that didn't work it, it would be clear
  2. We should probably have a --keep-on-failure option. Part of the reason it was useful to have our existing behavior, is that it helped with diagnostics. If a user had a problem we were able to direct them to run commands on the WSL instance to pull state

@ashley-cui
Copy link
Member

SGTM, will try to take a look, but I don't have a windows box for testing. Will try to get as far as possible with this :)

@benoitf
Copy link
Contributor Author

benoitf commented Aug 2, 2022

I reproduced it easily with a VM having https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ as by default there is no nested virtualization enabled by default on most of virtualization products

But I can also test some custom podman binaries that I can build from source code.

@n1hility
Copy link
Member

n1hility commented Aug 2, 2022

BTW there was some changes to the Makefile process that make cross-building the msi a little finicky. I can push up a build if needed (or test something as well) without much effort. just ping me

@gbraad
Copy link
Member

gbraad commented Aug 4, 2022

It is specific to the WSL2 provider as the Init seems to register (writeConfig) before the environment is confirmed to work:

This also happens when you break out during the Importing-stage.

Doing the following:

diff --git a/pkg/machine/wsl/machine.go b/pkg/machine/wsl/machine.go
index 189723ac7..bc0c5b400 100644
--- a/pkg/machine/wsl/machine.go
+++ b/pkg/machine/wsl/machine.go
@@ -349,14 +349,6 @@ func (v *MachineVM) Init(opts machine.InitOptions) (bool, error) {
                return false, err
        }
 
-       if err := v.writeConfig(); err != nil {
-               return false, err
-       }
-
-       if err := setupConnections(v, opts, sshDir); err != nil {
-               return false, err
-       }
-
        dist, err := provisionWSLDist(v)
        if err != nil {
                return false, err
@@ -375,6 +367,14 @@ func (v *MachineVM) Init(opts machine.InitOptions) (bool, error) {
                return false, err
        }
 
+       if err := v.writeConfig(); err != nil {
+               return false, err
+       }
+
+       if err := setupConnections(v, opts, sshDir); err != nil {
+               return false, err
+       }
+
        return true, nil
 }
 

would work

image

gbraad added a commit to gbraad-redhat/podman that referenced this issue Aug 5, 2022
…ritten

When the break out or the WSL environment fails to start, the config
and connections should not be written. Placing them at the end of the
provisioning step will mitigate the issue.

[NO NEW TESTS NEEDED]

Signed-off-by: Gerard Braad <[email protected]>
openshift-ci bot added a commit that referenced this issue Aug 5, 2022
Fixes #15154 Change order when config and connections are written
ashley-cui pushed a commit to ashley-cui/podman that referenced this issue Aug 8, 2022
…ritten

When the break out or the WSL environment fails to start, the config
and connections should not be written. Placing them at the end of the
provisioning step will mitigate the issue.

[NO NEW TESTS NEEDED]

Signed-off-by: Gerard Braad <[email protected]>
@yosiasz
Copy link

yosiasz commented Jun 17, 2023

this is still an issue for me.

wsl --list shows podman-machine-default
podman machine list shows

NAME                    VM TYPE     CREATED        LAST UP        CPUS        MEMORY      DISK SIZE
podman-machine-default  wsl         2 minutes ago  7 seconds ago  0           0B          0B

running wsl --unregister podman-machine-default removes this vm wsl --list shows no vms

but now I ran podman machine list and I still see this vm lingering around.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 16, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants