Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement virtfs volumes for podman machine #11454

Merged
merged 3 commits into from
Jan 6, 2022

Conversation

afbjorklund
Copy link
Contributor

@afbjorklund afbjorklund commented Sep 5, 2021

Allow using the built-in 9pfs feature of qemu,
mounting host directories into vm mountpoints.

https://wiki.qemu.org/Documentation/9psetup

Wait for the machine to be "running", otherwise
the SSH function might throw an error instead.

For #8016

Example usage:

$ ./bin/podman machine init -v /tmp/one:/mnt/one -v /tmp/two:/mnt/two
Extracting compressed file
$ mkdir -p /tmp/one /tmp/two
$ ./bin/podman machine start
INFO[0000] waiting for clients...                       
INFO[0000] listening tcp://0.0.0.0:7777                 
INFO[0000] new connection from @ to /run/user/1000/podman/qemu_podman-machine-default.sock 
Waiting for VM ...
Mounting volume... /tmp/one:/mnt/one
Mounting volume... /tmp/two:/mnt/two
$ touch /tmp/one/foo
$ ./bin/podman-remote run -v /mnt/one:/mnt busybox ls /mnt/
Resolved "busybox" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/busybox:latest...
Getting image source signatures
Copying blob sha256:8ec32b265e94aafb0d43ab71f1d8f786122c19afb37d25532aea169f414f8881
Copying blob sha256:8ec32b265e94aafb0d43ab71f1d8f786122c19afb37d25532aea169f414f8881
Copying config sha256:42b97d3c2ae95232263a04324aaf656dc80e7792dee6629a9eff276cdfb806c0
Writing manifest to image destination
Storing signatures
foo
$ touch /tmp/two/bar
$ ./bin/podman-remote run -v /mnt/two:/mnt busybox ls /mnt/
bar

Note: tested on Linux (Ubuntu 20.04)

Edit: tested OK on macOS 11.5 as well (works with some problems, see below for details)


Some directories (most) are read-only on CoreOS, you can work around this by mounting in another location:

podman machine init -v /Users:/mnt/Users

And then remember to add this extra prefix, when bind-mounting from the remote filesystem into the container:

podman --remote run -v /mnt/Users:/Users


This is a nice document on VirtFS : https://www.kernel.org/doc/ols/2010/ols2010-pages-109-120.pdf

By the virtue of its design VirtFS is expected to yield better performance compared to its alternatives like NFS/CIFS.

Note: CIFS is a dialect of SMB

https://en.wikipedia.org/wiki/9P_(protocol)

@rhatdan
Copy link
Member

rhatdan commented Sep 6, 2021

Needs tests or [NO TESTS NEEDED]

@afbjorklund
Copy link
Contributor Author

I'm not sure I could find any tests, related to podman machine ? Maybe I didn't look hard enough.

@rhatdan
Copy link
Member

rhatdan commented Sep 7, 2021

LGTM
@baude @ashley-cui @jwhonce PTAL

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 7, 2021

Apparently it will need some more QEMU patching, before it (virtfs/9p) will work also on Macs:

NixOS/nixpkgs#122420

https://github.com/afbjorklund/qemu/tree/9p-darwin (v6.0.0)

14 files changed, 368 insertions(+), 27 deletions(-)

Patches originally from https://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg07325.html

@AkihiroSuda
Copy link
Collaborator

Anybody tried to submit the patches to the upstream recently?

@afbjorklund

This comment has been minimized.

@afbjorklund
Copy link
Contributor Author

I was able to try it with a patched qemu (hvf + 9p-darwin) on Apple Silicon, and it works OK...

There was some trouble connecting due to ssh not being ready and a performance warning:

connect tcp 192.168.127.2:22: connection was refused

qemu-system-aarch64: warning: 9p: degraded performance

So there needs to be a sleep before doing the mkdir + mount, and need to set msize > 8192.

See https://wiki.qemu.org/Documentation/9psetup#msize for details

But other than those, it was no issue with running qemu-system-aarch64 on darwin-arm64. 🤗

../configure --target-list=aarch64-softmmu --enable-hvf -enable-virtfs

@afbjorklund
Copy link
Contributor Author

darwin-podman

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 11, 2021

Looks like those ssh errors are problems with the gvproxy implementation, rather than a slowness on Mac.

Will file a separate bug on those, looks like "gvproxy" is mostly a quick hack (fixed path, no error handling) ?

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 11, 2021

Still fails, so should probably check the VM IP for SSH rather than the gvproxy tunnel ?

error dialing "192.168.127.2:22": connect tcp 192.168.127.2:22: connection was refused

But it looks extremely hard-coded and early still, so probably wait for next "gvproxy"...

                Forwards: map[string]string{
                        fmt.Sprintf(":%d", sshPort): "192.168.127.2:22",
                },

Some arbitrary workaround like sleep, would probably make it work meanwhile.

EDIT: Opened containers/gvisor-tap-vsock#42

@afbjorklund

This comment has been minimized.

@TomSweeneyRedHat
Copy link
Member

LGTM
but the tests are complaining that you did not sign the PR?

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 12, 2021

but the tests are complaining that you did not sign the PR?

I guess I will squash and rebase it, just added some late night hacks 🌃

Will see if adding a Mac-only sleep will work, then I'll update the PR...
It works fine on Linux KVM, but I guess it just takes longer on Mac HVF ?

Did you get a chance to try it yet ? Eventually we need benchmarks, too.

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 12, 2021

The sleep didn't seem to help at all, so removing it. Works on Linux, something else is wrong ?

Will try again, once machine/gvisor doesn't hardcode the MAC address and IP address... 🙄

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Sep 13, 2021

Note that "isRunning" only checks if the qmp socket is up, not if the machine has started. For that, it needs to run QMP.

And that "isListening" only checks if the gvproxy is up, not if the ssh server has started. For that, it needs to run SSH.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2021
@afbjorklund afbjorklund mentioned this pull request Sep 22, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2021
@afbjorklund

This comment has been minimized.

@willcohen
Copy link

willcohen commented Oct 13, 2021

@afbjorklund I'm trying to replicate this setup (and if it works then perhaps even submit the nix patch set upstream, since it seems like they're happy to have that happen, just don't have the bandwidth. If it gets in in the next few weeks it might even make it to the next point release).

However, a question:
I can reproduce the mounted volume setup you've got going in that screenshot, and touch/modify files inside the podman machine and see that on the volume, so the overall functionality is clearly working. Is there still more work to be done to get this working via Dockerfiles and docker-compose?

I've got a Dockerfile with syntax generally following this model:

volumes:
      - "./src:/app"

Running docker-compose up results in:
ERROR: for container-foo Cannot create container for service container-foo: make cli opts(): error making volume mountpoint for volume /Users/<user>/projectpath/src: mkdir /Users: operation not permitted. I've also tried initializing the machine with the -v ./src:/app preset, in case I need to be setting that beforehand. I've also tried moving the target /app to /tmp/app, in case the permissions problem is on the CoreOS side. In both cases I am still getting permission errors, and am not quite sure where to go next. I don't want to start messing around with upstream submissions until a path for getting Dockerfile volumes is clear first, since that seems like the big use case that needs to be resolved here that users are recently running into.

Podman mounts _host-dir_ in the host to _machine-dir_ in the Podman machine.

The root filesystem is mounted read-only in the default operating system,
so mounts must be created under the /mnt directory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed, with the latest CoreOS

@ghost
Copy link

ghost commented Dec 18, 2021

@afbjorklund I actually just filed #12650, I'm convinced the hooks behavior is wrong. An example is there on using the hooks with reference docs.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 26, 2021
Allow using the built-in 9pfs feature of qemu,
mounting host directories into vm mountpoints.

The volumes are generic, the mounts are specific.

Wait for the machine to be "running", otherwise
the SSH function might throw an error instead.

Increase the default msize from 8 KiB to 128 KiB

[NO NEW TESTS NEEDED]

Signed-off-by: Anders F Björklund <[email protected]>
There are other mount types available, such as NFS or SMB,
or one could use reverse sshfs for better compatibility.

It could either be a global option, or it could perhaps be
overridden for each volume (like the container volumes).

Refactor the creation of the options string or array.

Allow specifying the volume as read-only, if desired.

[NO NEW TESTS NEEDED]

Signed-off-by: Anders F Björklund <[email protected]>
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2021
@afbjorklund
Copy link
Contributor Author

@protosam : since the machine volumes are mounted long before any container is started (i.e. when the VM is booted), and since they are unrelated to the container volumes, I don't understand how those OCI hooks would apply to the VM mounts ?

Use the same type of mounts for all the machine volumes.

The default could change in the future, depending on OS.

[NO NEW TESTS NEEDED]

Signed-off-by: Anders F Björklund <[email protected]>
@ghost
Copy link

ghost commented Dec 30, 2021

@afbjorklund

Until very recently I wasn't aware that Docker for Desktop exposes pretty much all of the host file system by mounting paths for parity in the guest, so /User/... (and guessing /mnt/c/... on windows perhaps?) match up for the runtime inside.

That's a lot of surface space to expose from a security perspective; so I have been exploring how to be surgical for a grpc+fuse based file system that I'm writing in Go.

@afbjorklund
Copy link
Contributor Author

I think the security and performance implications of the Docker model are well known, which was why I didn't want to copy them over at first... As you mention, it is not only your own user but also all other users on the host.

In this compromise, we made it a parameter (-v) so that the user can decide how much of the host they want to expose to the machine and to the containers. It is also possible to mount it as read-only (like Lima), if desired.

@ghost
Copy link

ghost commented Jan 2, 2022

Yeah, I agree with the thought process.

Curious about something, does the 9p implementation that qemu uses on mac not support creating named pipes and unix socket inodes? All of my testing on the custom qemu build has led me to believe it's not implemented. Though going beyond that I've actually not found a 9pfs server that even supports it.

If 9p doesn't support it, nfs might be a better option to parity docker-desktop.

This is actually what has led me to writing a fusefs and also I'm targeting gvproxy to deliver file system services. If I can't get my work committed to master when it's done, my backup plan is to just release a drop-in replacement for gvproxy for docker-desktop parity.

@afbjorklund
Copy link
Contributor Author

afbjorklund commented Jan 2, 2022

I think it only support files, and not even symlinks.

Creating sockets and other special files on a network filesystem is a bit weird ? The user would be much better off using a local filesystem for those, same goes with any heavy duty things like databases or whatnot. Create a local volume, for those ?

i.e. /var/lib/containers, rather than /mnt and 9pfs ?

@ghost
Copy link

ghost commented Jan 2, 2022

Haven't been able to figure out if 9pfs even supports implementing mknod (I think it is implementable), but it is desirable because you may want to have the same path accessed between two containers. The kernel in the vm does make it possible to share pipes and sockets between containers.

For example with docker-desktop, open two terminals.

Terminal 1 do the following.

[protosam@nullhost]$ docker run --rm -it -v `pwd`:/usr/src-shared -w /usr/src-shared alpine
/usr/src-shared # mkfifo testpipe
/usr/src-shared # cat testpipe 

Now go over to terminal 2 and do this.

[protosam@nullhost]$ docker run --rm -it -v `pwd`:/usr/src-shared -w /usr/src-shared alpine
/usr/src-shared # echo hello pipe > testpipe
/usr/src-shared # 

Checking back at terminal 1 you will see:

/usr/src-shared # cat testpipe 
hello pipe

Edit for clarity: pwd was a path in /Users/${USER}/... for me.

@afbjorklund

This comment has been minimized.

@baude
Copy link
Member

baude commented Jan 6, 2022

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afbjorklund, baude

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2022
@openshift-merge-robot openshift-merge-robot merged commit d627528 into containers:main Jan 6, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants