Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native overlay storage has significant impact on rootless containers creation time #1749

Open
travier opened this issue Jun 14, 2024 · 25 comments · Fixed by containers/common#2206

Comments

@travier
Copy link
Member

travier commented Jun 14, 2024

Native overlay storage has significant impact on container creation time and podman command responsiveness (podman images).

Switching back to fuse-overlayfs mitigates this issue, trading it off for some runtime cost. Note that for most rootless containers such as toolboxes, the performance overhead of fuse-overlayfs is likely negligible as the home directory is bind mounted and thus not going through fuse. Containers that also store their data in volumes are not penalized either.

# Record / backup your containers first as the command below will remove them all
$ podman systemd reset

# Create the storage config file with the content below
$ cat ~/.config/containers/storage.conf 
[storage]
driver = "overlay"
[storage.options.overlay]
mount_program = "/usr/bin/fuse-overlayfs"

# Pull your containers again

See:


Original issue: Dependency on fuse-overlayfs in containers-common moved to Suggests

fuse-overlayfs has been moved from a Recommends to a Suggests in Fedora 40 and later:

I manually added it back in Fedora Atomic Desktops for F40: https://pagure.io/workstation-ostree-config/pull-request/526

In Fedora CoreOS we have it explicitly listed in https://github.com/coreos/fedora-coreos-config/blob/testing-devel/manifests/fedora-coreos-base.yaml#L106.

We should reach out to the podman folks to figure out if we should also remove it with the update to F41.

@travier travier added the F41 label Jun 14, 2024
@travier travier changed the title Dependency on fuse-overlayfs in containers-common moved to Suggests Dependency on fuse-overlayfs in containers-common moved to Suggests Jun 14, 2024
@debarshiray
Copy link

debarshiray commented Oct 16, 2024

The Podman folks want to move off fuse-overlayfs and encourage people to use the Linux kernel's overlay file system. I think the question for Fedora CoreOS is: how important is backwards compatibility for old containers?

Backwards compatibility is important for Toolbx. So, I am considering adding at least a Recommends on fuse-overlayfs to the toolbox RPM, until we can figure out a decent way to migrate users than some cryptic string hidden in some logs.

@debarshiray
Copy link

I tried to upstream the changes to unbreak Fedora 40:
containers/common#2203

@travier
Copy link
Member Author

travier commented Oct 16, 2024

We should look into this again as we're nearing the F41 release/rebase so that would be a good time to remove it. Otherwise, we might have to carry it until F42.

@dustymabe
Copy link
Member

@travier @debarshiray - trying to tease out some details for the meeting discussion today:

  • does having fuse-overlayfs installed cause it to be preferred?

ideally if it's installed it's just there for backwards compat, but newly created machines (or even possibly new containers on upgrading machines) would use the preferred in kernel overlayfs.

@jlebon
Copy link
Member

jlebon commented Oct 16, 2024

@travier @debarshiray - trying to tease out some details for the meeting discussion today:

* does having `fuse-overlayfs` installed cause it to be preferred?

At least on my f40 Silverblue, it's there (which I guess is thanks to @travier) but my backend is still overlay:

podman info | grep -i graphDriver
  graphDriverName: overlay

I think this is one of those things that only gets updated if you do e.g. podman system reset to reset storage defaults.

I'd be surprised if a lot of people are still using it, so the impact should be minimal. Though OTOH we also didn't mention it in our f41 communications and it feels a bit late for a potentially breaking change.

@c4rt0 c4rt0 removed the meeting topics for meetings label Oct 16, 2024
@c4rt0
Copy link
Member

c4rt0 commented Oct 16, 2024

This was discussed today in #meeting-1:fedoraproject.org,

AGREED: We won't remove the package for F41 but will work with the podman team to determine the appropriate time to do the deprecation. This may be on a Fedora major boundary with an associated Fedora change request.

@eriksjolund
Copy link

Maybe not so important but good to know:
fuse-overlayfs is faster than native overlay
at creating a new container when these conditions are all met:

  • rootless Podman is used
  • a modified UID/GID mapping is used
  • no container has yet been created with the specified container image and UID/GID mapping

Reference: https://github.com/containers/podman/blob/b65f3b19a59e4af17c1649461e9366f39f97e51b/docs/tutorials/performance.md?plain=1#L68-L84

@dustymabe
Copy link
Member

Nice @eriksjolund. @giuseppe I assume containers/podman#16541 (comment) is still the state?

@giuseppe
Copy link

Is there any particular reason to drop it? It is a quite small binary (~112kb on Fedora 40) and there are still cases where it is needed. For example, running on NFS. That is not possible for rootless using native overlay.

@eriksjolund
Copy link

I did a benchmark in Fedora CoreOS 41.20241006.1.1 VM running on a Macbook Pro M1

type elapsed time
fuse-overlayfs 4.894 s
native overlay 81.382 s

In the test I ran this command 10 times in a for loop:

 podman run --rm --userns=auto --pull=never -d registry.fedoraproject.org/fedora sleep inf

native overlay

test54@fcos-next5:~$ time for i in {0..9}; do
  podman run --rm --userns=auto --pull=never -d registry.fedoraproject.org/fedora sleep inf
done
db22ddcc2d5f0ad39c3dabd6b71b66e74e6b00ebcb1be4cdcdcf7d5461c33554
629648712c37c2e9e1a538d6b1a8b1a5903f84b3a7c35deb4c79f819bfd53969
c11bbeabd424d46379cc2852eee91c6feade29f456aaafda27fc2621a1fce41d
f94f0daa3e640b5c78278d5d92c387dbfed3dfec7e839ac94b54faeb17d716ff
63f175d78dbc70e87a76310feb28a1fcb0e9f101d6c15d8dfea7d22326634e56
05da9ec6abb22c7420ce5f6db49e65fef14490af8e39544a6ee3b65b58b4b6ad
f55ccef1746054c1504aef30a37a4505284d85ca607213210fed4bc396693bb0
c80e1c05422fc323f2df85d5ee831b6c3495f87f74c0129247ba96df4b1647ff
65f9d285dc8ddf73982ab5d11d18d52d4c93e611f6b2216ab3be472a5d2c6025
7450b8c983dfb97eada5569fdeb257a2a5dc4b085b4aea41142052af800c8808

real	1m21.382s
user	0m6.171s
sys	0m58.626s
test54@fcos-next5:~$ 

fuse-overlayfs

test55@fcos-next5:~$ time for i in {0..9}; do
  podman run --rm --userns=auto --pull=never -d registry.fedoraproject.org/fedora sleep inf
done
632c9edcabb0c031736e9b247fb65d26312d73c2831cb526ab08f14000be0049
8a540aa90441fbf992c43cb91577aa500e788a24a929e26866e5cb331dba36d5
cf949b7cd4133d5d5f7f2a96ebda273255ddd4cd66b6616e07ad8122257cce30
aaf66310d37df6b7ddaa0ab8208a956ac0f99b4f53c769976c2867b792ff2b2e
e24fd07ff2c432b2418d0564ec9da5a7c0fc76816cec34e0b125f03318c0dfa5
7f30a4ee3ca73672aeb98b6a40618993fc04cdf8a1a2f0a34de4c3613d3c27ad
a83851405da49bc9ccc55b30393f9dc6461c341cdc7c99eeba08b407bd1bb376
c4d512fa7a3d6b80f0bc5ebe0f6e852a18483fb0f1a1e2ede80298688e3718b2
10778b8360e1c5519e300731c73a8e64198b5f66c4b59d44322201e2045ec7ba
d9d7fede5c348100dc8224db0f053c09a0d3d6a7ad5a8bec057e2dc6f7fa5b92

real	0m4.894s
user	0m1.265s
sys	0m1.283s

@jlebon
Copy link
Member

jlebon commented Oct 16, 2024

Is there any particular reason to drop it? It is a quite small binary (~112kb on Fedora 40) and there are still cases where it is needed. For example, running on NFS. That is not possible for rootless using native overlay.

To clarify, we're not purposely dropping fuse-overlayfs. This is just the combination of (1) podman moving it from a Requires to a Suggests and (2) FCOS not picking up weak deps by default. We're purposely keeping it in f41, but longer term if there are reasons to keep it by default, then it seems more appropriate to revert the change to a weak dep. Otherwise, users will always be free to layer it back if they need it.

@debarshiray
Copy link

Is there any particular reason to drop it? It is a quite small binary (~112kb on Fedora 40) and there are still cases where it is needed. For example, running on NFS. That is not possible for rootless using native overlay.

To clarify, we're not purposely dropping fuse-overlayfs. This is just the combination of (1) podman moving it from a Requires to a Suggests and (2) FCOS not picking up weak deps by default.

Just to nitpick on some details. :)

It was already a weak dependency in containers-common when Fedora 40 was released, for anything that's not Fedora Server:

Recommends: fuse-overlayfs
Requires: (fuse-overlayfs if fedora-release-identity-server)

I am guessing it was being explicitly pulled in by Fedora CoreOS because it generally skips weak dependencies.

Right after Fedora 40 came out, it was demoted to a Suggests, which is ignored by default in Fedora, so it's as good as not mentioned at all.

That's how it fell off the Fedora Silverblue images, until it was rescued by @travier

We're purposely keeping it in f41, but longer term if there are reasons to keep it by default, then it seems more appropriate to revert the change to a weak dep. Otherwise, users will always be free to layer it back if they need it.

For what it's worth, just yesterday, I filed some pull requests to revert the change for Fedora 40 and older.

@debarshiray
Copy link

Not having fuse-overlayfs has consequences for at least the Fedora Workstation live media. The live media already uses the Linux kernel's overlay file system. So, if fuse-overlayfs is absent then it's not possible to use another overlayfs on top of the existing overlayfs when creating containers.

liveuser@localhost-live:~$ podman version
Error: configure storage: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver

I ended up putting Recommends: fuse-overlayfs in the toolbox RPM for Fedora 41 onwards to fend off the immediate backwards compatibility question for old containers created with fuse-overlayfs.

I don't want to break old containers without offering users a migration strategy. Ideally, I want to detect that a container is still using fuse-overlayfs, and warn the user about its deprecation for a few months or releases before dropping it.

@debarshiray
Copy link

@jlebon @dustymabe I haven't yet been able to figure out the exact conditions under which fuse-overlayfs is used and the exact steps to trigger the backwards compatibility breakage. I am still working on it intermittently between other things, unless @giuseppe beats me to it. :)

Note that containers-storage.conf(5) at /usr/share/containers/storage.conf no longer mentions a mount_program, which means it's suggesting the Linux kernel's overlay file system.

My plan is to do some Git archaeology to find out exactly when the switch away from fuse-overlayfs happened, create a container in that scenario, and try to break it by removing fuse-overlayfs.

One thing is certain, though. Just having fuse-overlayfs installed on a currently supported Fedora doesn't mean that a newly created container is actually using it. I tested this on freshly installed Fedora Workstation 39 and 40 virtual machines by creating a container with fuse-overlayfs present, rebooted and removed fuse-overlayfs, rebooted and tried to use the containers. They worked.

I have no idea how many users have containers like this, but we did get some reports when it disappeared from the Fedora Silverblue 40 image:
https://discussion.fedoraproject.org/t/rpm-ostree-update-breaks-toolbox-fedora-40
containers/toolbox#1512

@travier
Copy link
Member Author

travier commented Oct 17, 2024

At least on my f40 Silverblue, it's there (which I guess is thanks to @travier) but my backend is still overlay:

I added it back for Fedora 40 Atomic Desktops but it's going away in F41 for now: https://pagure.io/workstation-ostree-config/pull-request/526

This is also very similar to what happened for fedora-silverblue/issue-tracker#547 & fedora-silverblue/issue-tracker#246.

I think this is one of those things that only gets updated if you do e.g. podman system reset to reset storage defaults.

That's my understanding as well.

One thing I've realized is that we just had people do a podman systemd reset for the podman v5 update: #1629 (comment)

Thus it's very likely that their system are already using the new graphDriver? At least my FCOS nodes that I had to manually system reset for the podman 5 migration have it.

So maybe this is a non-issue for us and we can just drop it?

@travier
Copy link
Member Author

travier commented Oct 17, 2024

Catching up and reading #1749 (comment) & #1749 (comment) & #1749 (comment), it feels like we should keep it and even recommend it for rootless containers until containers/podman#16541 (comment) is fixed. I'm going to add it back to the Atomic Desktops.

@giuseppe
Copy link

it feels like we should keep it and even recommend it for rootless containers until containers/podman#16541 (comment) is fixed

this is unlikely to be fixed soon. You'll hit this issue only with nested user namespaces (i.e. rootless users using --userns or --uidmapping), so for the generic case native overlay is faster. But I agree it is better to keep fuse-overlayfs installed and available when needed, its cost is minimal in terms of disk space

@dustymabe
Copy link
Member

But I agree it is better to keep fuse-overlayfs installed and available when needed, its cost is minimal in terms of disk space

my question is this..

In the case fuse-overlayfs would be much faster for creating a container, is it automatically used if available? or would a user have to know about it and configure things to use it?

If it has to be configured I imagine no one is using it TBH. A helpful log message that got printed to stderr if a container create took more than 10s mentioning it as an alternative would be useful here.

@giuseppe
Copy link

it is automatically used when native overlay is not usable. It can be for several reasons, like running on a network file system, too old kernel, or already running on top of overlay

@dustymabe
Copy link
Member

@giuseppe that is great information! I wonder if we can get https://github.com/containers/podman/blob/993ecd5a05a42269cf5b4547d3bf92cae9efffed/docs/tutorials/performance.md#performance-considerations updated to include that information?

To my former question:

OK, so it will only be used when native overlay can't, meaning there are cases where it will be used but not performant (like @eriksjolund demonstrated above). Would detecting this scenario and printing a helpful message be useful?

@giuseppe
Copy link

@giuseppe that is great information! I wonder if we can get https://github.com/containers/podman/blob/993ecd5a05a42269cf5b4547d3bf92cae9efffed/docs/tutorials/performance.md#performance-considerations updated to include that information?

there is already a note about fuse-overlayfs vs native overlay in a user namespace (https://github.com/containers/podman/blob/993ecd5a05a42269cf5b4547d3bf92cae9efffed/docs/tutorials/performance.md#choosing-a-storage-driver). What do you think should be added?

OK, so it will only be used when native overlay can't, meaning there are cases where it will be used but not performant (like @eriksjolund demonstrated above). Would detecting this scenario and printing a helpful message be useful?

switching to fuse-overlayfs will affect the whole storage, once you start using fuse-overlayfs you can't go back unless you run podman system reset -f. So we want to minimize the amount of users that are stuck in a fuse-overlayfs world and use native overlay, which is faster at runtime once the container started

@dustymabe
Copy link
Member

@giuseppe that is great information! I wonder if we can get https://github.com/containers/podman/blob/993ecd5a05a42269cf5b4547d3bf92cae9efffed/docs/tutorials/performance.md#performance-considerations updated to include that information?

there is already a note about fuse-overlayfs vs native overlay in a user namespace (https://github.com/containers/podman/blob/993ecd5a05a42269cf5b4547d3bf92cae9efffed/docs/tutorials/performance.md#choosing-a-storage-driver). What do you think should be added?

yes. from what I can tell that section mentions the performance tradeoffs of each when either could be used, but it doesn't mention at all when fuse-overlayfs will be used because native overlay isn't possible. From your comment above that would be when:

  • running on a network file system
  • too old kernel
  • already running on top of overlay

In other words, there are reasons to keep fuse-overlayfs around because there are some use cases that native overlay doesn't support, but I don't think that is in the docs anywhere?

OK, so it will only be used when native overlay can't, meaning there are cases where it will be used but not performant (like @eriksjolund demonstrated above). Would detecting this scenario and printing a helpful message be useful?

switching to fuse-overlayfs will affect the whole storage, once you start using fuse-overlayfs you can't go back unless you run podman system reset -f. So we want to minimize the amount of users that are stuck in a fuse-overlayfs world and use native overlay, which is faster at runtime once the container started

I wish they were interchangeable and could be applied dynamically based on the conditions, but I guess that's not really in the realm of possibility otherwise it would have been implemented that way.

debarshiray added a commit to debarshiray/containers-common that referenced this issue Oct 17, 2024
Commit 5ad221d ("Revert \"Move fuse-overlayfs to suggests\" for
Fedora 40 and older") restored the dependency on fuse-overlayfs for
Fedora 40 and older to not disrupt stable Fedora releases by breaking
backwards compatibility with existing containers.

Fedora completely ignores Suggests by default [1].  So, listing anything
as Suggests is as good as not mentioning it at all, unless it's intended
for tools and users who specifically respect it.

It turns out that there are important use-cases where the Linux kernel's
overlay file system doesn't work, and one could really benefit from
having fuse-overlayfs(1).  A container cannot use an overlayfs when the
underlying file system is also an overlayfs, such as on the Fedora
Workstation live media, or a network file system.

Therefore, it's worth restoring the dependency on all Fedora releases to
cover these use-cases.

As suggested by Giuseppe Scrivano.

This reverts Fedora commit 447945e59a01cb6715ed2a21877d45bf0b91ef67 for
all Fedora releases.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/WeakDependencies/

Fixes: coreos/fedora-coreos-tracker#1749
Fixes: containers/toolbox#1512
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2299284

Signed-off-by: Debarshi Ray <[email protected]>
@debarshiray
Copy link

Sent a pull request to restore the fuse-overlayfs dependency in the Podman stack for all Fedora releases: containers/common#2206

I didn't touch any Fedora derivatives, because I assume that it needs further discussion.

@travier
Copy link
Member Author

travier commented Nov 28, 2024

I've switched my systems back to fuse-overlayfs for rootless containers and that solved the issues I had with creation time for my toolbox containers and made the podman images command responsive again.

I wrote an issue for the Atomic Desktops: https://gitlab.com/fedora/ostree/sig/-/issues/57

@travier travier changed the title Dependency on fuse-overlayfs in containers-common moved to Suggests fuse-overlayfs storage has significant impact on rootless containers creation time Nov 28, 2024
@travier travier changed the title fuse-overlayfs storage has significant impact on rootless containers creation time Native overlay storage has significant impact on rootless containers creation time Nov 28, 2024
@dustymabe
Copy link
Member

dustymabe commented Dec 2, 2024

I kind of agree that maybe fuse-overlayfs should be the default for rootless containers for now (i.e. in the case fuse-overlayfs is available and the system doesn't already have storage configured). The performance hit at creation time is significant (especially for large images) and most users don't actually understand what is going on and just think their system is frozon.

Since this won't be fixed anytime soon in native Overlay perhaps switching the default for rootless podman would be advisable.

martinpitt pushed a commit to martinpitt/ostree-pitti-workstation that referenced this issue Dec 18, 2024
podman now defaults to native overlayfs for the storage driver and
removed the dependency on fuse-overlayfs in containers-common.

We had manually kept it in Fedora 40 as it was released with
fuse-overlayfs so we had to keep it for compatibility during the major
release lifetime.

From discussions, it appears that this driver is still useful in
rootless container use cases and has better performance there, thus
let's keep it in the Atomic Desktops for now as this is a common use
case.

See: https://src.fedoraproject.org/rpms/containers-common/c/447945e59a01cb6715ed2a21877d45bf0b91ef67
See: coreos/fedora-coreos-tracker#1749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants