Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman machine init: add a --with-foreign-arch flag #13667

Closed
wants to merge 3 commits into from
Closed

podman machine init: add a --with-foreign-arch flag #13667

wants to merge 3 commits into from

Conversation

zhenya-1007
Copy link

The flag is qemu-specific. It causes qemu-static & friends to be installed on the machine so that, e.g., x86_64 binaries can be run on aarch64 (aka "arm64"). This is "scratching an itch" for me because I am on Mac M1, but I want to build Docker images whose Dockerfiles contain RUN instruction(s) for x86_64.

I did manual testing as follows, so (I believe it is the case that) [NO TESTS NEEDED].

  • Preparation:

    • built podman per instructions in build_osx.md
    • establish a clean starting state: ./bin/podman machine stop && ./bin/podman machine rm -f
  • Run without the new option:

    • ./bin/podman machine init --now
    • ./bin/podman machine ssh
      • sudo systemctl status install-qemu-static.service: shows up as disabled and inactive
      • sudo systemctl edit ready.service: the After line looks like this:
        # After=remove-moby.service sshd.socket sshd.service
    • ./bin/podman machine run --arch amd64 fedora:35 uname -a: fails with {"msg":"exec container process /usr/bin/uname: Exec format error","level":"error","time":"2022-03-26T08:07:03.000635889Z"} (as expected)
  • Clean up: ./bin/podman machine stop && ./bin/podman machine rm -f

  • Run with the new option:

    • ./bin/podman machine init --with-foreign-arch --now (unfortunately it's now perceptably slower)
    • ./bin/podman machine ssh
      • sudo systemctl status install-qemu-static.service: shows up as enabled and having run successfully
      • sudo systemctl edit ready.service: the After line looks like this:
        # After=remove-moby.service sshd.socket sshd.service install-qemu-static.service
    • ./bin/podman machine run --arch amd64 fedora:35 uname -a: prints Linux 81c4b2d16270 5.15.18-200.fc35.aarch64 #1 SMP Sat Jan 29 12:44:33 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zhenya-1007
To complete the pull request process, please assign mtrmac after the PR has been reviewed.
You can assign the PR to them by writing /assign @mtrmac in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Mar 26, 2022

@baude @n1hility Is this just fixing a bug? IE Shouldn't these services be enabled by default?

@zhenya-1007
Copy link
Author

zhenya-1007 commented Mar 26, 2022

@rhatdan Wouldn't enabling qemu-static & Co by default, "make people pay for what they don't use" in many cases?
Consider a developer who is using a Linux x86_64 machine, and is only interested in building x86_64 container images.
What do you imagine would be their reaction when they found out that Podman pulled down a bunch of binaries they, as likely as not, will never use, "just in case"? This stuff isn't exactly light-weight to install...

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 26, 2022
@zhenya-1007
Copy link
Author

/cc @baude @Luap99 @vrothberg

@openshift-ci openshift-ci bot requested review from baude, Luap99 and vrothberg March 28, 2022 16:59
# if the package is already installed. This is useful if the package is
# added to the root image in a future Fedora CoreOS release as it will
# prevent the service from failing.
ExecStart=/usr/bin/rpm-ostree install --apply-live --allow-inactive qemu qemu-user-static qemu-user-binfmt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustymabe what happens with ^^ when an FCOS update is downloaded and applied on a reboot?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to wait until an FCOS update is out, but my semi-educated guess is that /lib/binfmt.d will become empty following an FCOS update, so the installation will be triggered following an update, due to the Condition* clauses in the [Install] section. I can report that this does only run once for a given FCOS version, due to those same Condition* clauses (once the packages are installed /lib/binfmt.d is no longer emtpy).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An update should be handled without an issue. You want to make sure this only runs once. Our docs use ConditionPathExists=!/var/lib/%N.stamp and ExecStart=/bin/touch /var/lib/%N.stamp for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustymabe when this runs during boot up, does the entire boot sequence pause and wait for the install to complete? Or can the user begin interacting with the FCOS VM while the install occurs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to go ahead and start interacting with the machine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I have set up FCOS VMs with a similar systemd unit "by hand", and I am able to ssh to the machine, and interact with it while the installation is running. Once I ssh to the machine, I can use systemctl status install-qemu-static.service to check on how the installation is progressing. (Obviously, I need to wait for the installation to finish before I can run "foreign" binaries).

pkg/machine/ignition.go Outdated Show resolved Hide resolved
@baude
Copy link
Member

baude commented Mar 28, 2022

I want some time to think about this one and to see what others think. I was thinking this would eventually be something to GUI would enable rather than built into podman.

If we were to move forward with it, I'm wondering if it should be an opt in thing vs default. From the code snippets, I could not confirm if it was ... anyways, the installation of those packages are not trivial in time.

The real goal should be that qemu-static be shipped as part of the image itself, imho.

@zhenya-1007
Copy link
Author

I want some time to think about this one and to see what others think. I was thinking this would eventually be something to GUI would enable rather than built into podman.

Please do take time to think about it.

Are there any GUIs for podman that exist today? If so, could you point me to it/them?1

In the end, I want to have some way for users to run "foreign" arch binaries. I appreciate that users have the theoretical option of coming up with an ignition file, and providing it as the --ignition-path argument to podman machine init today -- but asking that each user who wishes to run "foreign" arch binaries with podman should create a custom ignition file and keep it up to date with whatever changes podman makes as it evolves is asking way too much of the user, IMO.

If we were to move forward with it, I'm wondering if it should be an opt in thing vs default. From the code snippets, I could not confirm if it was ... anyways, the installation of those packages are not trivial in time.

It is an opt-in in the current implementation, precisely because I didn't want to make everyone pay for what only some people will use. Take another look at line 255 in ignition.go. I appreciate that it may have been easy to miss on first read.

The real goal should be that qemu-static be shipped as part of the image itself, imho.

What would motivate FCOS to make this change? The "I want to run binaries for 'foreign' CPU(s)" use case strikes me as very far fetched in the context in which [I am imagining] the team that works on FCOS operates.

Footnotes

  1. I have found podmand-desktop-comanion, but the current state of their machine creation code doesn't exactly fill me with optimism that they will have a workable system for provisioning custom FCOS images any time soon.

@zhenya-1007
Copy link
Author

zhenya-1007 commented Mar 29, 2022

/cc @rhatdan

Adding to the reviewers list since @baude says he, "wants to see what people think."

@openshift-ci openshift-ci bot requested a review from rhatdan March 29, 2022 01:19
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!


flags.BoolVar(
&initOpts.QemuStatic,
"with-foreign-arch", false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether the naming was debated already. But other commands just use --arch (e.g., podman run), so I'd prefer to keep the naming consistent and rename it to --arch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bool and does not take a value so it would not be consistent with --arch. But I also do not like the name, maybe --emulation would fit better?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's out of step with the style in which the rest of the options are named.
I am listing below some possibilities I can think of. How do you [plural] feel about them?

  • --cross
  • --cross-arch
  • --non-native
  • --emulation
  • --emulate

FWIW, the two that start with --cross are my favourite (evocative of cross-compilation).

Incidentally, do you [plural] know off the top of your head if spf13/cobra does "unique prefix" option matching (i.e., if there's an option spelled --foobar, and no other option starts with --foo, will it accept just --foo in place of --foobar?)

IFS="|";\
for iprxy in $(/usr/bin/base64 -d ${FWCFGRAW}); do\
echo "export $iprxy" >> ${PROFILE_CONF}; done ) || \
echo "#Got nothing from QEMU FW_CFG"> ${PROFILE_CONF}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the whitespace changes above belong into this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other people were stumbling over them also, so I rewrote the commit to get rid of them.

WantedBy=multi-user.target
`

if ign.QemuStatic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on what the regex is intended to do. Ideally, move it into a separate function with unit tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment. I think a separate function would be overkill.

@cevich
Copy link
Member

cevich commented Mar 29, 2022

There's no need to install the binaries directly on the host. All that's needed is pointing the kernel at some available binaries to use. This is easily done by using the pre-built container images from https://github.com/multiarch/qemu-user-static or taking that as a pattern to build your own. The README.md in that repo. has all the details and documentation needed.

@zhenya-1007
Copy link
Author

zhenya-1007 commented Mar 29, 2022

There's no need to install the binaries directly on the host. All that's needed is pointing the kernel at some available binaries to use. This is easily done by using the pre-built container images from https://github.com/multiarch/qemu-user-static or taking that as a pattern to build your own. The README.md in that repo. has all the details and documentation needed.

@cevich Yes, I remember coming across that project.

Is the "only for x86_64 hosts" part of the README accurate?

If so, then I am afraid your proposed solution does not address my primary use case, which is running x86_64 binaries on Mac M1 (i.e., the host is aarch64) -- or at least not "out of the box."

That said, maybe it would be more productive for me to contribute an "aarch64 -> x86_64" piece to the multiarch/qemu-user-static project than to see this PR to completion...

Parenthetically, I have found documentation for multiarch/qemu-user-static to be quite hard to follow. In fact, I was thoroughly confused as to what was going on until I found a reference to .github directory in the documentation, and looked at the workflow/actions.yml file in that directory. Then things (finally!) started to make sense -- even though, among other things, the description of the input to "the pipeline" is out of date with respect to what workflow.yml is actually doing.

@cevich
Copy link
Member

cevich commented Mar 29, 2022

Those specific container images aren't actually needed, you can roll your own. All you need is a aarch64 container image with the qemu-user-static binaries available. Run that on the host (VM) with --privileged, and you can tickle /proc/sys/fs/binfmt_misc by hand if need be. Or copy and use the multiarch/qemu-user-static script that does the business + it's qemu helper script.

@zhenya-1007
Copy link
Author

Those specific container images aren't actually needed, you can roll your own. All you need is a aarch64 container image with the qemu-user-static binaries available. Run that on the host (VM) with --privileged, and you can tickle /proc/sys/fs/binfmt_misc by hand if need be. Or copy and use the multiarch/qemu-user-static script that does the business + it's qemu helper script.

@cevich Thank you for explaining!

So, it sounds like, with a bit of copying-pasting of existing code, running amd64 binaries on Mac M1 under podman could be as simple as

user@mac $ podman machine init && podman machine ssh
core@localhost $ podman run --privileged fedora:<latest_stable> /bin/bash -c 'sh <(curl -fsSL https://raw.githubusercontent.com/.../enable.sh)' && logout
user@mac $ podman run --arch amd64 <whatever> 

One just needs to fill in the ellipses in the curl invocation above. ;-)

The flag is qemu-specific.  It causes `qemu-static` & friends to be installed on the machine so that, e.g., x86_64 binaries can be run on aarch64 (aka "arm64").

Signed-off-by: Evgeny ("Zhenya") Roubinchtein <[email protected]>

I have done manual testing as described in the PR message, so (I believe it is the case that) [NO NEW TESTS NEEDED]
Explain what manipulation of the text of the `ready` unit intends to accomplish.

Signed-off-by: Evgeny ("Zhenya") Roubinchtein <[email protected]>
Use the `/var/lib/<service_name>.stamp` convention for the `install-qemu-static` unit.

Signed-off-by: Evgeny ("Zhenya") Roubinchtein <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Mar 29, 2022

I would rather document this via the container image and potentially ship the container image for qemu-user-static rather then modify the podman machine command. Since podman machine can support multiple different VM images, it seems like it would make sense to either ship qemu-user-static preinstalled in the VM (My preference) or to create an multi platform container image on quay.io/podman with qemu-user-static and tell people how to use it.

@cevich
Copy link
Member

cevich commented Mar 29, 2022

ship the container image for qemu-user-static

If we want to build and ship something multi-arch outselves, my build-push PR brings that in to podman. All we need is a contrib/blah directory with Containerfiles for me to point it at.

@zhenya-1007
Copy link
Author

I would rather document this via the container image and potentially ship the container image for qemu-user-static rather then modify the podman machine command. Since podman machine can support multiple different VM images, it seems like it would make sense to either ship qemu-user-static preinstalled in the VM (My preference) or to create an multi platform container image on quay.io/podman with qemu-user-static and tell people how to use it.

I agree with the sentiment, actually: I just didn't recognize multiarch/qemu-user-static as a viable option until @cevich explained.1 Let me work out the details of that, and I can either PR against that repo or make it available in some other way (quay.io/podman sounds like a good option).

Parenthetically, I don't know that you'll have much luck convincing even the FCOS team2 to ship qemu-user-static as part of FCOS, because of the, "why make every user pay for what only a few users will actually need" argument -- but I have been wrong before (occasionally). ;-)

Footnotes

  1. I blame the somewhat lacklustre (IMO) state of that project's documentation.

  2. not to even mention the Ubuntu, Debian, etc. teams

@zhenya-1007
Copy link
Author

If we want to build and ship something multi-arch outselves, my build-push PR brings that in to podman. All we need is a contrib/blah directory with Containerfiles for me to point it at.

This PR really isn't about "building multi-arch images as part of Podman/OpenShift infrastructure": it's about enabling users of podman to build/run multi-arch images in the context of their podman installation. I appreciate that your PR does good things, but, AFAICS, it's at best tangentially related to this PR.


Enable running binaries compiled for "foreign" CPUs (e.g., run x86_64 binaries on
Apple M1 silicon). This option only works for Qemu machines. It works by installing
qemu-static and qemu-binfmt packages on the machine. The initialization process will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'packages on the machine.' I think you mean packages on the Qemu machine. Suggested change to remove the concern about installing it on the host machine instead.

@evgeny-roubinchtein
Copy link

@cevich I am looking at it again, and, from what I have seen so far, the "have the kernel eagerly load, and then persist an interpreter for binfmt_misc" game that multiarch/qemu-user-static is playing doesn't work when the host system has SELinux enabled (as FCOS does in its default configuration). Can you point me to an example where the multiarch/qemu-user-static approach works even when SELinux is enabled on the host system?

What I have tried (prompts are intended to be suggestive of command running on the FCOS host vs the container):

fcos $ sudo -i
fcos # podman run --privileged --rm -it fedora:35 /bin/bash -i
container # dnf -y install qemu-user-static
container # mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
container # cat /usr/lib/binfmt.d/qemu-x86_64-static.conf >/proc/sys/fs/binfmt_misc/register
container # exit
fcos # cat /proc/sys/fs/binfmt_misc/qemu-x86_64
enabled
interpreter /usr/bin/qemu-x86_64-static
flags: F
[...] 
fcos # podman run --arch amd64 --rm -it alpine:latest uname -m
fcos # exit
# <a bunch of output from podman about pulling image, but no output from `uname`>
fcos $ podman run --arch amd64 --rm -it alpine:latest uname -m
# <ditto>

Could you please provide a transcript of a working example at a similar level of detail?1

Footnotes

  1. I already know that copying out the binary and the registration file, and then placing the registration file in /etc/binfmt.d, where it gets picked up by systemd-binfmt.service works -- but you said that, "There's no need to install the binaries directly on the host.", so I imagine you can provide a transcript to back up that claim?

@cevich
Copy link
Member

cevich commented Apr 19, 2022

Oof, I did not encounter SELinux related problems AFAIR, it basically "just worked" 😖

What I'm using right now (in automation) under CentOS Stream-8, and does a:

    podman run --rm --privileged \
        docker.io/multiarch/qemu-user-static:latest \
        --reset -p yes

But then it runs multi-arch operations from within a super-privileged container:

podman run --detach --name=buildah
            --net=host --ipc=host --pid=host
            --cgroupns=host --privileged
            --security-opt label=disable
            --security-opt seccomp=unconfined
            --device /dev/fuse:rw
            ...

That works with SELinux enabled on the host, but obvious it's been disabled for the container, along with seccomp. I'm not an expert in either, but I think that's necessary in this case.

@cevich
Copy link
Member

cevich commented Apr 19, 2022

Update: Just tried on F35, installed qemu-user-static and qemu-user-binfmt, then podman run --arch arm64 --rm -it quay.io/libpod/alpine:latest uname -m ran just fine (on my amd64 laptop). So hmmm 🤔 I see:

 $ ls -Z /usr/bin/qemu-aarch64-static
system_u:object_r:bin_t:s0 /usr/bin/qemu-aarch64-static

Have you tried running the initial container with --security-opt label=disable? Not sure if that will help or not 😢

@zhenya-1007
Copy link
Author

@cevich Thank you for following up on this!

Turns out, if I add, --security-opt label=disable to all the podman run invocations in my original "transcript", then I do get x86_64 printed by uname -m, so:

fcos $ sudo -i
fcos # podman run --privileged --rm -it --security-opt label=disable fedora:35 /bin/bash -i
container # dnf -y install qemu-user-static
container # mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
container # cat /usr/lib/binfmt.d/qemu-x86_64-static.conf >/proc/sys/fs/binfmt_misc/register
container # exit
fcos # cat /proc/sys/fs/binfmt_misc/qemu-x86_64
enabled
interpreter /usr/bin/qemu-x86_64-static
flags: F
[...] 
fcos # podman run --arch amd64 --rm -it --security-opt label=disable alpine:latest uname -m
x86_64 #  :-)
fcos # exit
fcos $ podman run --arch amd64 --rm -it --security-opt label=disable  alpine:latest uname -m
x86_64 # :-)

Case closed. Thanks again for all your help with this!

@zhenya-1007
Copy link
Author

Perhaps this should be documented somewhere -- ideally somewhere that's easy to find from Podman documentation...

@rhatdan
Copy link
Member

rhatdan commented Apr 20, 2022

What AVCs were you seeing?

@cevich
Copy link
Member

cevich commented Apr 20, 2022

Dan, I think the exact AVC's maybe don't matter since the binaries are coming from within another container (which also has --security-opt label=disable). That is to say, I'm not sure any sane SELinux module could actually be written in this case, we probably want it to always fail out of an abundance of caution. It does work properly/safely (without --security-opt label=disable) if you install qemu-user-static and qemu-user-binfmt on the host directly. That's possible in FCOS with rpm overrides (assuming that's the right term). But you're the expert, is there a way to force a label onto binaries coming from within a container like this?

@rhatdan
Copy link
Member

rhatdan commented Apr 20, 2022

SGTM

@zhenya-1007
Copy link
Author

zhenya-1007 commented Apr 22, 2022

But, but... Installing qeum-user-static1 on the FCOS host is exactly how this PR started out. At some point in the discussion, we declared that, "there is no need to install anything on the host", and closed the PR. However, now it sounds like, if one wants to stick with fairly simple system configuration one has to choose between running with --security-opt label=disable or installing something on the host, and I am not entirely certain that one of those two options is unambiguously less drastic than the other. Granted, intermediate to advanced users have a range of options available to them, but my hope here was to provide something beginners could use. Should we consider reopening this PR?

Footnotes

  1. Upon closer examination, I see that one really ought to pick either qemu-user-static or qemu-user-binfmt, but not both; I might even try submitting a patch that makes one conflict with the other, and see what the distro people think...

@n1hility
Copy link
Member

IMO bundling it on the machine OS is the right way to go. With aarch64 becoming more prevalent on developer systems and increasing usage of multi arch deployments usage of it is likely to be common. FWIW Docker also bundles it on M1, so —platform just works out of the box

@rhatdan
Copy link
Member

rhatdan commented Apr 22, 2022

I agree it should be in the machine.

@zhenya-1007
Copy link
Author

@n1hility @rhatdan: "bundling it on the machine OS" doesn't specify how such bundling is to be accomplished. I see at least two options:

  • podman will arrange to layer the qemu-* binaries when it provisions the VM; this is the approach this PR is taking
  • the FCOS team will arrange to include the qemu-* binaries into the OS image(s) it distributes

Are you speaking in support of one of the options above, and if, so which one? Or are you suggesting a completely different approach?

@n1hility
Copy link
Member

@zhenya-1007 i am just a contributor so don’t take this response as any sort of authority, but i would suggest both directions: propose this to the base FCOS since it seems to be on-mission and then consider a layer approach as a transition solution (or final if they decline) - I personally think it should be included by default without a required option to init. Note that if everyone agrees this is the way to go there would need to be a port for WSL as well (I can help with that).

@rhatdan
Copy link
Member

rhatdan commented Apr 22, 2022

I would prefer that it comes by default from FCOS. The issue might be size. If not then I say Podman installs it by default from a container image. I don't think we should have it as an option, since I don't think most users would understand or care.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants