Machine init with an explicit image path url always downloads #14388

cpolizzi · 2022-05-26T20:07:06Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Initializing a new podman machine using a specific FCOS image URL always downloads the image from the URL and does not first check if it already exists to bypass the download.

Steps to reproduce the issue:

podman machine init --image-path https://builds.coreos.fedoraproject.org/prod/streams/next/builds/36.20220522.1.0/aarch64/fedora-coreos-36.20220522.1.0-qemu.aarch64.qcow2.xz
podman machine list
ls -l ~/.local/share/containers/podman/machine/qemu/fedora-coreos-36.20220522.1.0-qemu.aarch64.qcow2.xz
podman machine rm --force
ls -l ~/.local/share/containers/podman/machine/qemu/fedora-coreos-36.20220522.1.0-qemu.aarch64.qcow2.xz'
podman machine init --image-path https://builds.coreos.fedoraproject.org/prod/streams/next/builds/36.20220522.1.0/aarch64/fedora-coreos-36.20220522.1.0-qemu.aarch64.qcow2.xz

Describe the results you received:
FCOS image is always download despite that it exists on the host in the correct cached location.

Describe the results you expected:
FCOS image is only downloaded when needed (e.g., not present in the correct cached location, otherwise, cached image is used). Download should be bypassed and the FCOS image should immediately be extracted. FCOS build stream metadata should not be checked nor verified.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman --version:

podman version 4.1.0

Package info (e.g. output of brew info podman):

podman: stable 4.1.0 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/opt/homebrew/Cellar/podman/4.1.0 (174 files, 46.3MB) *
  Poured from bottle on 2022-05-20 at 22:19:07
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go ✘, go-md2man ✘
Required: qemu ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions
==> Analytics
install: 21,312 (30 days), 57,625 (90 days), 147,469 (365 days)
install-on-request: 21,263 (30 days), 57,567 (90 days), 147,397 (365 days)
build-error: 0 (30 days)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

M1 Mac Mini

The text was updated successfully, but these errors were encountered:

mheon · 2022-05-26T20:12:53Z

@baude PTAL

baude · 2022-05-31T14:29:02Z

right now, that is the defined behavior and not a bug. that said, we could change the behavior. when we deal with official images, we can determine if an image has changed by it's shasum. in the case of an individual url like you explained, there is no mechanism for that (right now). If you happen to be developing and testing an fcos image (finger pointing at me), then this default behavior is preferable because the filename of the test image may not change.

what say you @containers/podman-maintainers ?

cpolizzi · 2022-05-31T15:29:28Z

I think by the official images you are referring to the latest ones that can be accessed by the logical build stream names of test, stable or next, correct? Indeed, when any one of those are already present on the host system, and provided that a new version on the build stream is not available, then a re-download never happens (as it is very clear that it is pulling the associated metadata from the build stream).

For the layer of automation orchestration I have on top of podman I chose to version pin the FCOS image version in use as this being used with a large corporate client. This eases our support activities with that client base.

With that said, it is good to know this is not a bug. But I still think the SHASUM should always been recomputed and validated and on every machine init (unless pulling from a URL that does have a corresponding metadata stream associated with it; e.g., local file system).

I suggest the best of both worlds: a new feature on the CLI for machine init indicating that use, or not, any cached image that may be already present (e.g., machine init --use-cached or machine init --ignore-cached).

github-actions · 2022-07-01T00:07:43Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-07-01T12:02:29Z

@baude @mheon Is this still an issue?

alexanderankin · 2022-07-19T18:39:53Z

@rhatdan - I have to recreate my podman machine every day that i want to use it (m1 issues), so downloading a 600mb fcos image every day is a pretty major problem (slow, uses a lot of bandwidth quota).

I'm not even using any specific urls or anything, just finding that the default behavior is creating an issue in my usage.

ashley-cui · 2022-07-19T18:55:42Z

@alexanderankin, if you're using the default fcos images (ie, next, stable), and not using a custom URL, Podman should not be downloading the image every day, as we do cache the image. Otherwise, the best thing to do to avoid pulling every day is to download the image in another directory and point the --image-path to where you downloaded the image.

alexanderankin · 2022-07-19T19:05:37Z

@ashley-cui I didn't realize that --image-path could be a local path, that works, sorry for hijacking the issue

cpolizzi · 2022-07-19T19:10:33Z

@alexanderankin Good news, this is not an M1 Mac issue at all. But what you say is on point.

@ashley-cui The "custom URL" is from the FCOS build streams itself. This is why I offered a possible resolution to this from my previous comment on this issue. Having some "baked in" feature for the convenience of engineers / developers just because they need it without an ability to override that behavior is poor thinking, design and implementation. Think globally. Sorry, but, I am bringing to the table the global perspective beyond that of engineering (which I am a die hard engineer). I have had to work around this behavior for a strategic client and partner (and we can talk about this offline if you would like). Fact is, time to think "enterprise'.

ashley-cui · 2022-07-19T19:23:42Z

@cpolizzi I understand you are using a FCOS build stream itself, but Podman is unable to verify the shasum of the pulled file if it is straight from a URL. We're only able to cache the next/stable versions of FCOS because FCOS provides a pull package that will tell us shasums before we actually pull the new file. With a generic URL, podman is unable to verify shasums to pull, and I'm hesitant to cache based on other factors, such as filenames, because they might change, or the contents will change with the same filename. I would recommend working around the issue like @alexanderankin has, by downloading the image separately from podman and setting the pull path to the local image.

cpolizzi · 2022-07-19T19:35:00Z

@ashley-cui I beg to differ. Podman client could easily be orchestrated to pull the build metadata endpoint already present, locate the appropriate JSON node in the response from the endpoint and then compare the SHA sum for that release to what is currently and locally on disk. This is just simple behavior modification and is in no way "not possible". This is not purely caching but this is caching with validation. Yeah, I've already worked around the baked in behavior but again one should not have to.

Consider that every "custom URL" as you call it (which it is not, it is a URL from the stream metadata endpoint itself) contains everything that is needed. The "custom URL" is directly from this build stream metadata endpoint and in fact, as you know, has the name of the build stream itself embedded in the URL, predictably so: https://builds.coreos.fedoraproject.org/prod/streams/{stream-name}/builds/...

Given this fact the "custom URL" could easily be parsed to determine the build stream and from that pull the metadata endpoint reliably: https://builds.coreos.fedoraproject.org/streams/{stream-name}.json

The JSON response has all the information needed if one is willing to dig through it with say JSON query (architecture + artifact + formats) which will yield the specific download URL as well as signatures, image SHA-256 and the uncompressed image SHA-256. Alternatively one could JSON query for the "custom URL" in this response payload and then "backstep" to the parent node defining this information. And yet, a third alternative is to simply JSON query based on the "custom URL" release name for against the metadata endpoint response for the value of release where the value of this is the release version such as 36.20220703.3.1.

So, would you be so kind as to explain why what you state is not feasible please? Surely, it is additional logic, But, it is entirely possible from where I sit.

So when you say "I'm hesitant to cache based on other factors, such as filenames, because they might change ..." is this because you are thinking "well, what someone self-hosted the build streams privately?" Because if not then the filenames, paths and URLs themselves have already been set in proverbial stone. Otherwise, with private self-hosting of this entire ecosystem of build automation and what not it still could actually be reliably determined. All that needs to happen in that case is the simple and trivial definition of "standards".

hhellbusch · 2022-07-19T20:53:33Z

Hi all - I'll weigh in and see if what I say helps some :)
I've been working w/ @cpolizzi on the aforementioned partner/customer. Some background: more or less we've worked with them and set up podman as a replacement for docker & docker-desktop on the MacOS development environment. To do this, we've had to build a fairly large wrapper to orchestrate the set-up and manage the installation of podman and configure the podman-machine once it's up an running.

One objective we've had is "it works everyday" for the end-users at for the 100+ developers with new developers onboarding / setting up podman (and often first exposure to containers at all). To do this we have to pin the version of podman; the podman machine.

The wrapper we wrote is configured to pull in a specific version of the podman machine that we know works reliably, and this is done via the URL from the fedoraproject.

Maybe it would help if we constrain the scope some? I like what @baude suggested by comparing sha's.
In the case of a fedoraproject URL being provided, there ought to be a SHA we can download very quickly to confirm that the image on the filesystem matches. This would allow the system to quickly detect if it has the correct image downloaded already and save a lot of time during set up of a new podman machine. Between the combo of the corporate VPN and corporate proxy, downloads are slow to say the least.

We've also been experiencing (several times a week) where the resulting download is truncated for some reason (likely fault of the corporate proxy/vpn). The image that gets created on the filesystem is 0 bytes. If the code is co-located well, perhaps we could also put in a check to say "ensure that the downloaded file matches" before trying to create/init the machine itself. The errors that occur when podman tries to create a machine from a 0 byte image is cryptic to say the least. It more or less tries to create the machine, but the machine never responds.

rhatdan · 2022-07-21T13:18:20Z

PRs welcome...

github-actions · 2022-08-21T00:08:49Z

A friendly reminder that this issue had no activity for 30 days.

cpolizzi · 2022-08-26T14:54:43Z

We handled this case in our own automation that seamlessly provides a smooth end developer experience for our customer from the state of "I have no container engine at all" to "I am up and running with podman", all in restrictive environments. We actually are going to have to handle this in our same automation instead but were hoping the podman project would just simply handle it which would be substantially better (and same for all of the things we had to do for getting podman to work in restrictive environments). Also please note that this "automation" is a fancy set of a bunch of script modules all orchestrated by a top-level single entry point script.

baude · 2022-08-27T16:21:07Z

@cpolizzi Ok, let's think this one through and see if we can find a compromise for your use case. When podman does an init without a custom URL, it goes online and parses some JSON that is generated by the FCOS release process. The parsing is done with a library provided to us by the FCOS people.

The image-path command line option was not deigned with your use case in mind. It was developed for a one-off boot of an image; in fact, largely just handy for us to test an unpublished image. If you are using custom FCOS builds and using their tools like cosa, then you could easily also produce this same kind of JSON.

So my solution would be as follows. If you want to mimic what we do with init, we could add a containers-common variable in the machine stanza for where Podman would look for a JSON file. From there, it will behave like normal by using sha's to determine if an image is available or the same. We basically did exactly this approach when Podman 4 released but Fedora 36 has not; I made a temporary download location that had the JSON and hardwired that location in Podman's code. This makes me reasonably confident it would work.

One minor issue is that for a new user who has not run Podman yet, that conf file does not exist. So it would need to be provided in your case; or we would have to expose a command-line option; the latter of which I would prefer to avoid.

github-actions · 2022-09-30T00:15:19Z

A friendly reminder that this issue had no activity for 30 days.

cpolizzi · 2022-09-30T18:36:05Z

@baude My apologies for my delay on this. Currently, for our customer, we have resolved this through our existing automation and also did a slight course correction for future planned work. That course correction is that we import our desired FCOS image from the project for both Intel and Apple silicon into a private storage solution. We augmented our automation to perform the SHA validation of what the user has locally versus what we provision into that centralized private storage solution and react accordingly. The next part of this course correction with our customer is that we intend to customize the FCOS image from the project via cosa to add in all the "bits" that are needed. However, I am not - yet - familiar enough with cosa so I am not certain regarding the feasibility of this. But this is our current plan at this time. I am curious as to why you would prefer to avoid exposing a command line option however as part of the init.

vratinov · 2023-02-13T19:42:25Z

Is there a way to make --image-path as part of some config. So when the end-user runs "podman machine init" the config is read from their respective config file (or a global config file) and uses an alternative path for grabbing the respective coreos?

rhatdan · 2023-02-15T09:23:00Z

in containers.conf

[machine]
# Default image URI when creating a new VM using `podman machine init`.
# Options: On Linux/Mac, `testing`, `stable`, `next`. On Windows, the major
# version of the OS (e.g `36`) for Fedora 36. For all platforms you can
# alternatively specify a custom download URL to an image. Container engines
# translate URIs $OS and $ARCH to the native OS and ARCH. URI
# "https://example.com/$OS/$ARCH/foobar.ami" becomes
# "https://example.com/linux/amd64/foobar.ami" on a Linux AMD machine.
# The default value is `testing`.
#
# image = "testing"

rhatdan · 2023-02-15T09:24:11Z

https://github.com/containers/common/blob/main/docs/containers.conf.5.md

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label May 26, 2022

nobodyman1 mentioned this issue Jun 24, 2022

Podman Init wont Resolve DNS Windows WSL #14495

Closed

github-actions bot added the stale-issue label Jul 1, 2022

rhatdan removed the stale-issue label Jul 1, 2022

github-actions bot added the stale-issue label Aug 21, 2022

rhatdan removed the stale-issue label Aug 22, 2022

github-actions bot added the stale-issue label Sep 30, 2022

rhatdan removed the stale-issue label Sep 30, 2022

rhatdan closed this as completed Feb 15, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 1, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine init with an explicit image path url always downloads #14388

Machine init with an explicit image path url always downloads #14388

cpolizzi commented May 26, 2022 •

edited

Loading

mheon commented May 26, 2022

baude commented May 31, 2022

cpolizzi commented May 31, 2022

github-actions bot commented Jul 1, 2022

rhatdan commented Jul 1, 2022

alexanderankin commented Jul 19, 2022 •

edited

Loading

ashley-cui commented Jul 19, 2022

alexanderankin commented Jul 19, 2022

cpolizzi commented Jul 19, 2022

ashley-cui commented Jul 19, 2022

cpolizzi commented Jul 19, 2022 •

edited

Loading

hhellbusch commented Jul 19, 2022

rhatdan commented Jul 21, 2022

github-actions bot commented Aug 21, 2022

cpolizzi commented Aug 26, 2022

baude commented Aug 27, 2022

github-actions bot commented Sep 30, 2022

cpolizzi commented Sep 30, 2022

vratinov commented Feb 13, 2023

rhatdan commented Feb 15, 2023

rhatdan commented Feb 15, 2023

Machine init with an explicit image path url always downloads #14388

Machine init with an explicit image path url always downloads #14388

Comments

cpolizzi commented May 26, 2022 • edited Loading

mheon commented May 26, 2022

baude commented May 31, 2022

cpolizzi commented May 31, 2022

github-actions bot commented Jul 1, 2022

rhatdan commented Jul 1, 2022

alexanderankin commented Jul 19, 2022 • edited Loading

ashley-cui commented Jul 19, 2022

alexanderankin commented Jul 19, 2022

cpolizzi commented Jul 19, 2022

ashley-cui commented Jul 19, 2022

cpolizzi commented Jul 19, 2022 • edited Loading

hhellbusch commented Jul 19, 2022

rhatdan commented Jul 21, 2022

github-actions bot commented Aug 21, 2022

cpolizzi commented Aug 26, 2022

baude commented Aug 27, 2022

github-actions bot commented Sep 30, 2022

cpolizzi commented Sep 30, 2022

vratinov commented Feb 13, 2023

rhatdan commented Feb 15, 2023

rhatdan commented Feb 15, 2023

cpolizzi commented May 26, 2022 •

edited

Loading

alexanderankin commented Jul 19, 2022 •

edited

Loading

cpolizzi commented Jul 19, 2022 •

edited

Loading