-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for full overlayfs for /
#3113
Comments
I worry about this in general, because it breaks a basic container-like workload, because you can't (safely) rebase to a new version while keeping the existing overlay uppper-dir. (Any random change to a file will always replace a new version of the file from the image. This works in containers, because you start new containers whenever you rebase, which means throwing away the upper layer. Do we intend to do the same here? |
I see you mention the upgrade issue. However, if we do that, is this really a useful feature that users want? |
I believe this will perfectly solve the issue we have with some of the nvidia rpms. I think applications like crowstrike will struggle as their client performs a type of "activation" and maintains state with the controller all under the /opt/crowdstrike/ directory. ...but if I understand this correctly the software should still work, but they'd have to reestablish that activation with each os update. I think we can make this work, but this is an area where more feedback will hopefully help. |
@alexlarsson On upgrade indeed the old overlayfs data will be gone. But that's how it works since Docker first came out. |
I feel like this would be better solved by adding to the image a bind mount uniit from /opt/crowdstrike into /var instead, so that the activation can be persisted? Or am I missing something here? (or a symlink) |
I agree that would be much better. I think the challenge is mainly around users knowing these details like this about their applications, and also knowing rpm-ostree/ostree. Docker makes them care more about where things are persisted, traditional rpm systems do not. Earlier today I think @cgwalters or @stefwalter mentioned using the VOLUME key in the containerfile to help. Would it makes sense to create bindmounts like this, or something similar, for volumes that users declare? |
It would be cool if we could somehow mark a volume as needing to persist, and have bootc automatically set up the /var mapping for it. |
Closes: ostreedev#3113 It'd greatly improve compatibility with things like RPMs that install in `/opt` if we supported a full "original docker" style model where `/` is a transient overlayfs. We'd still keep our semantics for `/etc` and `/var` by default, but e.g. we'd stop recommending `/opt` ➡️ `/var/opt`, in this model, so `/opt` would be on the overlayfs. Note this all aligns with composefs, where we'd actually be making `/` a *read-only* overlayfs by default; it'd be really nice of course to *implement* this by just making the composefs overlayfs writable, but I am not sure we can hard require composefs for this right now. So this change adds support for `root.transient = true` in `/usr/lib/ostree/prepare-root.conf`. The major downside is that people could be surprised if files they write to e.g. `/opt` don't persist across upgrades. But, that's already again how it works since Docker started. Note as part of the implementation of this, we need to add a whole new "backing" directory distinct from the deployment directories. (Tangentially related to this, it's tempting to switch to always using a *read-only* overlay mount by default.
Closes: ostreedev#3113 It'd greatly improve compatibility with things like RPMs that install in `/opt` if we supported a full "original docker" style model where `/` is a transient overlayfs. We'd still keep our semantics for `/etc` and `/var` by default, but e.g. we'd stop recommending `/opt` ➡️ `/var/opt`, in this model, so `/opt` would be on the overlayfs. Note this all aligns with composefs, where we'd actually be making `/` a *read-only* overlayfs by default; it'd be really nice of course to *implement* this by just making the composefs overlayfs writable, but I am not sure we can hard require composefs for this right now. So this change adds support for `root.transient = true` in `/usr/lib/ostree/prepare-root.conf`. The major downside is that people could be surprised if files they write to e.g. `/opt` don't persist across upgrades. But, that's already again how it works since Docker started. Note as part of the implementation of this, we need to add a whole new "backing" directory distinct from the deployment directories. (Tangentially related to this, it's tempting to switch to always using a *read-only* overlay mount by default.
This would be amazing and match the behavior of people starting to use docker with volumes. I would go as far to say that this is something that they have to specify before a persistent volume (even /var) is created. Shouldn't we match the basic container behavior as much as we can? |
This is a good debate to have. I do think some use cases would be happy with a transient
That latter one would break Anaconda kickstarts setting up ssh keys. The other "volume" ostree sets up by default is I like the conceptual purity of making things work exactly the same as when docker first came out, but the fallout of doing so is really large and will force the majority of users into configuring persistent volumes by default. A really pernicious problem will be that things will appear to work until you do an OS update...and that problem is why ostree has the strict model it does by default. |
Tangentially related...an interesting debate to have is whether the root should reset on reboots that aren't OS updates. The existing PR (unlike |
For those interested in trying this out I've hacked up Previously:
With updated test image, configured for this:
And then:
|
This is true, but it's still confusing UX given that all the other packages that keep state in
Hmm, keeping the upperdir while we rebase the lowerdir sounds very close to what we want IMO. If we constrain it to It seems like this should work well enough to mark coreos/rpm-ostree#233 as closed. Testing this idea, it seems to work fine at least for Puppet (doing the install part client-side, but it could be made to work just as well in a container layering flow of course):
Puppet won't start because it can't create under
(I don't know Puppet at all; Ben's example might be a better test for this given that losing state affects functionality.) Slap on an overlay:
Let's mock an upgrade (in this case a downgrade actually since 7.27.0 is the latest):
Reslap on the upperdir on reboot:
The nice thing about this is that we're still fitting the OSTree model; code in Another nice thing is that I think Since this doesn't break existing flows (e.g. via Ignition or the MCO), this approach could also more easily be rolled out to FCOS and OCP. Thoughts? I'm sure I'm probably missing something. |
This is definitely right; in the general case people will need to be aware of it. But the scope of necessary changes is much smaller than forcing them to entirely move where things are placed into
Overlayfs support for changes to the lower while mounted offline used to not be supported at all; looks like nowadays it is, if the appropriate mount options are enabled. I don't have much experience with this. Hmm, turning of "metadata only copy up" would be a big performance hit on systems with reflinks. I think the strong argument against this though is basically: It's not how Docker worked when it came out. The semantic for
This however is not easy to explain or implement (especially in a transactional way)! You're arguing to treat all of
Except not in a container where we want this (which is fixable but more importantly) it's not what It feels like bigger picture the debate here is basically:
We already have the first; the goal is the second. |
But note we only expect state files in the upper layer, so a copyup shouldn't be that common.
OTOH just to highlight, the main argument for it is that it fits quite well in the OSTree model as it exists today. :) We could fix
No, only
Right, for the container flow, it would still be
Agreed. I'm likely missing context here, but ISTM the base problem we're trying to fix is If we fix the |
All that said, I won't harp on this too long. I see a huge opportunity here to possibly fix a longstanding issue (and I think the code paths touched are very similar to those touched for |
Note that most of that code is only necessary (as the comment says) because the Fedora If we didn't have that bug then the overrides in place for
Yes, you are right that it is inconsistent for If we do run into issues with code that expects writability to |
I mean |
This is a reasonable question. But flipping things around a bit...we will have different mounts set up by default. For example, we're not compromising on The I guess the way I am thinking of things, in this mode we keep the "ostree core model of /etc, /usr, /var" semantics - but things outside of it now have "transient overlay" semantics instead of just being disallowed. (One tangential thing here is systemd is very focused on "/usr-is-OS" and "/etc is empty", which is even stricter than ostree semantics, but they are compatible) |
A different tangential but notable thing: because overlayfs doesn't implement atime for lower in this mode we effectively start ignoring atime changes for all the OS state in |
Can you clarify here: Is "fully transient mode" here just having |
Yeah, exactly. But of course it still wouldn't be consistent, because programs in Another approach is to keep it as is, but not frame it as a "Docker style" knob because the bits that retain state far outweigh the bits that don't. I would still like to investigate the rebased upperdir approach though. I don't think it conceptually conflicts with |
While the term "docker" is definitely in this issue, it's not in the current docs (which are pretty small, and clearly need elaboration and probably graphics).
I'm not opposed in theory to supporting something like this as it'd just be a tweak on top of what overlayfs supports, but I'd like to dig deeper into exactly what it would fix, what the config option would look like, and really ideally have concrete examples of existing software that would be fixed. Are you thinking of cases like the one Ben mentioned with
? If so then yes I think your ("rootfs.merge") would fix cases like that automatically. But a notable tricky thing here is the case of "what happens if the app modifies a file that's in the base image". With RPM (and dpkg I think), unless something is explicitly marked as a config file, it will get replaced on update, even if it's modified (I'm not sure RPM even checks if it's changed). So for app binaries if they happen to be written to at runtime, we'll still reliably get updated binaries. I'm worried about corner cases like Python apps that end up recompiling the bytecode at runtime (just because it's writable and timestamps...) and hence touch the |
👍
Heh, I would posit the reverse: since most apps have state, I would expect some loss of functionality from not retaining it. Obviously the degree of functionality loss varies. E.g. even the Puppet example we used, which probably works OK in But the other thing I'll stress here is that the state overlay approach is compatible with existing systems which means we'd fix the
Yeah exactly. This is what I'm talking about in #3113 (comment) (sentence starting with "Even so"). I think we'd need to address it eventually, but we'd get quite far I think without worrying about it to start.
Yes good point. Something to investigate. |
Right, that's why we've had the strict model since forever.
I think your proposal here needs an explicit name ( I'm a bit confused though because on existing systems that have e.g. |
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
I'm using the term "state overlays" in code. Updated the comment for clarification.
Migrating existing nodes is certainly possible. At its core, I think we'd basically move I think the tricky thing is doing it in a resilient way that will work seamlessly across a rollback. For that, I think probably we'll have to ship a systemd service ahead of time that does the Anyway, for now, I think let's explore how well this option works before worrying about migration. I opened #3120 and coreos/rpm-ostree#4728. The other big piece missing of course is changing the container path to move |
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
In the OSTree model, executables go in `/usr`, state in `/var` and configuration in `/etc`. Software that lives in `/opt` however messes this up because it often mixes code *and* state, making it harder to manage. More generally, it's sometimes useful to have the OSTree commit contain code under a certain path, but still allow that path to be writable by software and the sysadmin at runtime (`/usr/local` is another instance). Add the concept of state overlays. A state overlay is an overlayfs mount whose upper directory, which contains unmanaged state, is carried forward on top of a lower directory, containing OSTree-managed files. In the example of `/usr/local`, OSTree commits can ship content there, all while allowing users to e.g. add scripts in `/usr/local/bin` when booted into that commit. Some reconciliation logic is executed whenever the base is updated so that newer files in the base are never shadowed by a copied up version in the upper directory. This matches RPM semantics when upgrading packages whose files may have been modified. For ease of integration, this is exposed as a systemd template unit which any downstream distro/user can enable. The instance name is the mountpath in escaped systemd path notation (e.g. `[email protected]`). See discussions in ostreedev#3113 for more details.
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: coreos#233
This solves the `/opt` problem by using the new state overlay concept in OSTree: an overlay filesystem is mounted on top of `/usr/lib/opt` and the upper dir is automatically "rebased" whenever new content comes in. Concretely, this means that app state is carried forward, all while allowing the (OSTree-managed) package contents to be updated. We also solve the `/usr/local` problem the same way. The app state issue isn't really present there, but `/usr/local` has traditionally been system state. We want to keep supporting dropping files there all while also supporting shipping OSTree-owned content. See also: ostreedev/ostree#3113 Fixes: #233
It'd greatly improve compatibility with things like RPMs that install in
/opt
if we supported a full "original docker" style model where/
is a transient overlayfs. We'd still keep our semantics for/etc
and/var
by default, but e.g. we'd stop recommending/opt
➡️/var/opt
, so/opt
would be on the overlayfs.Note this all aligns with composefs, where we'd actually be making
/
a read-only overlayfs by default; it'd be really nice of course to implement this by just making the composefs overlayfs writable, but I am not sure we can hard require composefs for this right now.Something like
in
/usr/lib/ostree/prepare-root.conf
.Downsides
The major downside is that people could be surprised if files they write to e.g.
/opt
don't persist across upgrades. But, that's already again how it works since Docker started.The text was updated successfully, but these errors were encountered: