Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce bootc-owned container store, use for bound images #724

Merged
merged 1 commit into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ install:
install -D -m 0755 -t $(DESTDIR)$(prefix)/bin target/release/bootc
install -d -m 0755 $(DESTDIR)$(prefix)/lib/bootc/bound-images.d
install -d -m 0755 $(DESTDIR)$(prefix)/lib/bootc/kargs.d
ln -s /sysroot/ostree/bootc/storage $(DESTDIR)$(prefix)/lib/bootc/storage
install -d -m 0755 $(DESTDIR)$(prefix)/lib/systemd/system-generators/
ln -f $(DESTDIR)$(prefix)/bin/bootc $(DESTDIR)$(prefix)/lib/systemd/system-generators/bootc-systemd-generator
install -d $(DESTDIR)$(prefix)/lib/bootc/install
Expand Down
21 changes: 17 additions & 4 deletions docs/src/experimental-logically-bound-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ This experimental feature enables an association of container "app" images to a

## Using logically bound images

Each image is defined in a [Podman Quadlet](https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html) `.image` or `.container` file. An image is selected to be bound by creating a symlink in the `/usr/lib/bootc/bound-images.d` directory pointing to a `.image` or `.container` file. With these defined, during a `bootc upgrade` or `bootc switch` the bound images defined in the new bootc image will be automatically pulled via podman.
Each image is defined in a [Podman Quadlet](https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html) `.image` or `.container` file. An image is selected to be bound by creating a symlink in the `/usr/lib/bootc/bound-images.d` directory pointing to a `.image` or `.container` file.

With these defined, during a `bootc upgrade` or `bootc switch` the bound images defined in the new bootc image will be automatically pulled into the bootc image storage, and are available to container runtimes such as podman by explicitly configuring them to point to the bootc storage as an "additional image store", via e.g.:

`podman --storage-opt=additionalimagestore=/usr/lib/bootc/storage run <image> ...`

An example Containerfile

Expand All @@ -28,8 +32,17 @@ RUN ln -s /usr/share/containers/systemd/my-app.image /usr/lib/bootc/bound-images
ln -s /usr/share/containers/systemd/my-app.image /usr/lib/bootc/bound-images.d/my-app.image
```

In the `.container` definition, you should use:

```
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we globally configure the additional image store via containers-storage.conf? That would quite improve the UX.

Copy link
Collaborator Author

@cgwalters cgwalters Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's a possibility. That said, changing the storage configuration without having containers/storage#1885 is...a bit ugly.

There are a few other things that have me slightly wary:

  • All of this code is basically a no-op unless bound images are in use, but if we added an additional store it would mean every podman invocation would see it...it'd be empty, but it would be a global behavioral change
  • There's been some lingering concerns I've seen in the past around what happens if...

Yeah ok, I think this a near showstopper for globally (i.e. in storage.conf)
setting additionalimagestores (hereafter: AIS) as currently defined today.

For the current bootc logically bound images (hereafter: LBI) design, we only opt in individual podman invocations to use the store, and every image there is readonly. It cannot change or break any non-LB images.

But when globally configuring an AIS, there's two actors:

  • bootc
  • podman

That operate independently of each other, but every podman invocation will see the bootc AIS.

Take the situation where there are two different "FROM " images, one is an LBI and another is free-floating (could be managed by kube/ansible/whatever). If the bootc AIS happens to have the layer(s) for <somebase>, the podman pull <derived2> for the non-LBI (floating) image will reuse the common base. Which...makes sense!

Except today the bootc bound images logic will prune images that are no longer referenced.

And yes, I just tested this, and things explode (as you'd expect) in the scenario where a lower read-only layer disappears for the case of an image/container that lives in /var/lib/containers.


(Note we don't have the inverse problem, because when bootc operates on its storage it uses --root and should not be looking at or touching /var/lib/containers)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess of course, bootc's GC logic could learn to take into account the /var/lib/containers store state...but we're starting to "cross the worlds" with stuff like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, one might be thinking...isn't the real fix for this to have one storage by default, i.e. on bootc systems podman and bootc default to having the same storage for images (underneath /sysroot, though containers would probably still make sense under `/var).

And yes maybe, but the thing that terrified me about going down that path is that typing "podman ..." has the potential to break the operating system. There's lots of docs talking about podman system reset and the like.

Could we make it reliable? Yes, clearly. But it'd require I think a lot of careful thought, code auditing, cooperation and integration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if podman had a way to recover from it. IE Pull down the missing image but I am not sure the missing image/layer would still exist or have a way to discover where it exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had agreed to create a pinning image mechanism for you that was going to avoid the reset problem no? i thought that is the way we were going?

Copy link
Collaborator Author

@cgwalters cgwalters Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any commitment from anyone to work on image pinning on the podman side.

That said again, I should be very clear that my thoughts on this design changed as we got into the implementation. My original thought was basically "bootc is just running podman pull, how hard can it be Michael? 1 week?"

But yeah...having bootc touch /var (/var/lib/containers) just blows up our story around state, messes with transactional updates, partitioning, etc. So that led me rapidly down this path of having a bootc-owned storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't we want the ability to roll back versions of images?

But bottom line we have the same issue if anyone sets up additional stores. If an image is used from an additional store and gets removed, then the container will no longer work. In RHIVOS we thought about this and came to the conclusion that we would know which images were being used so we would know what images to remove. Also containers are ephemeral, they did not survive reboot.

Copy link
Collaborator Author

@cgwalters cgwalters Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't we want the ability to roll back versions of images?

bootc retains all LBIs that are referenced from any deployment, i.e. when you upgrade but then roll back, you are guaranteed that the LBIs that are referenced via the .image/.container files are still there.

(However, there's ill-defined semantics currently around how LBIs work with floating tags, but basically "don't do that")

```

## Pull secret

Images are fetched using the global bootc pull secret by default (`/etc/ostree/auth.json`). It is not yet supported to configure `PullSecret` in these image definitions.

## Limitations

- Currently, only the Image field of a `.image` or `.container` file is used to pull the image. Any other field is ignored.
- There is no cleanup during rollback.
- Images are subject to default garbage collection semantics; e.g. a background job pruning images without a running container may prune them. They can also be manually removed via e.g. podman rmi.
- Currently, only the Image field of a `.image` or `.container` file is used to pull the image; per above not even `PullSecret=` is supported.
- Images are not yet garbage collected
43 changes: 27 additions & 16 deletions lib/src/boundimage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,16 @@
//! pre-pulled (and in the future, pinned) before a new image root
//! is considered ready.
use crate::task::Task;
use anyhow::{Context, Result};
use camino::Utf8Path;
use cap_std_ext::cap_std::fs::Dir;
use cap_std_ext::dirext::CapStdExtDirExt;
use fn_error_context::context;
use ostree_ext::containers_image_proxy;
use ostree_ext::ostree::Deployment;
use ostree_ext::sysroot::SysrootLock;

use crate::imgstorage::PullMode;
use crate::store::Storage;

/// The path in a root for bound images; this directory should only contain
/// symbolic links to `.container` or `.image` files.
Expand All @@ -26,8 +27,8 @@ const BOUND_IMAGE_DIR: &str = "usr/lib/bootc/bound-images.d";
/// other pull options.
#[derive(Debug, PartialEq, Eq)]
pub(crate) struct BoundImage {
image: String,
auth_file: Option<String>,
pub(crate) image: String,
pub(crate) auth_file: Option<String>,
}

#[derive(Debug, PartialEq, Eq)]
Expand All @@ -37,10 +38,18 @@ pub(crate) struct ResolvedBoundImage {
}

/// Given a deployment, pull all container images it references.
pub(crate) fn pull_bound_images(sysroot: &SysrootLock, deployment: &Deployment) -> Result<()> {
pub(crate) async fn pull_bound_images(sysroot: &Storage, deployment: &Deployment) -> Result<()> {
let bound_images = query_bound_images_for_deployment(sysroot, deployment)?;
pull_images(sysroot, bound_images).await
}

#[context("Querying bound images")]
pub(crate) fn query_bound_images_for_deployment(
sysroot: &Storage,
deployment: &Deployment,
) -> Result<Vec<BoundImage>> {
let deployment_root = &crate::utils::deployment_fd(sysroot, deployment)?;
let bound_images = query_bound_images(deployment_root)?;
pull_images(deployment_root, bound_images)
query_bound_images(deployment_root)
}

#[context("Querying bound images")]
Expand Down Expand Up @@ -133,18 +142,20 @@ fn parse_container_file(file_contents: &tini::Ini) -> Result<BoundImage> {
Ok(bound_image)
}

#[context("pull bound images")]
pub(crate) fn pull_images(_deployment_root: &Dir, bound_images: Vec<BoundImage>) -> Result<()> {
#[context("Pulling bound images")]
pub(crate) async fn pull_images(sysroot: &Storage, bound_images: Vec<BoundImage>) -> Result<()> {
tracing::debug!("Pulling bound images: {}", bound_images.len());
//TODO: do this in parallel
for bound_image in bound_images {
let mut task = Task::new("Pulling bound image", "/usr/bin/podman")
.arg("pull")
.arg(&bound_image.image);
if let Some(auth_file) = &bound_image.auth_file {
task = task.arg("--authfile").arg(auth_file);
}
task.run()?;
let image = &bound_image.image;
let desc = format!("Updating bound image: {image}");
crate::utils::async_task_with_spinner(&desc, async move {
sysroot
.imgstore
.pull(&bound_image.image, PullMode::IfNotExists)
.await
})
.await?;
}

Ok(())
Expand Down
66 changes: 65 additions & 1 deletion lib/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,31 @@ pub(crate) enum ContainerOpts {
Lint,
}

/// Subcommands which operate on images.
#[derive(Debug, clap::Subcommand, PartialEq, Eq)]
pub(crate) enum ImageCmdOpts {
/// Wrapper for `podman image list` in bootc storage.
List {
#[clap(allow_hyphen_values = true)]
args: Vec<OsString>,
},
/// Wrapper for `podman image build` in bootc storage.
Build {
#[clap(allow_hyphen_values = true)]
args: Vec<OsString>,
},
/// Wrapper for `podman image pull` in bootc storage.
Pull {
#[clap(allow_hyphen_values = true)]
args: Vec<OsString>,
},
/// Wrapper for `podman image push` in bootc storage.
Push {
#[clap(allow_hyphen_values = true)]
args: Vec<OsString>,
},
}

/// Subcommands which operate on images.
#[derive(Debug, clap::Subcommand, PartialEq, Eq)]
pub(crate) enum ImageOpts {
Expand Down Expand Up @@ -232,6 +257,16 @@ pub(crate) enum ImageOpts {
/// this will make the image accessible via e.g. `podman run localhost/bootc` and for builds.
target: Option<String>,
},
/// Copy a container image from the default `containers-storage:` to the bootc-owned container storage.
PullFromDefaultStorage {
/// The image to pull
image: String,
},
/// List fetched images stored in the bootc storage.
///
/// Note that these are distinct from images stored via e.g. `podman`.
#[clap(subcommand)]
Cmd(ImageCmdOpts),
}

/// Hidden, internal only options
Expand All @@ -247,6 +282,8 @@ pub(crate) enum InternalsOpts {
FixupEtcFstab,
/// Should only be used by `make update-generated`
PrintJsonSchema,
/// Perform cleanup actions
Cleanup,
}

impl InternalsOpts {
Expand Down Expand Up @@ -430,10 +467,12 @@ pub(crate) async fn get_locked_sysroot() -> Result<ostree_ext::sysroot::SysrootL
Ok(sysroot)
}

/// Load global storage state, expecting that we're booted into a bootc system.
#[context("Initializing storage")]
pub(crate) async fn get_storage() -> Result<crate::store::Storage> {
let global_run = Dir::open_ambient_dir("/run", cap_std::ambient_authority())?;
let sysroot = get_locked_sysroot().await?;
crate::store::Storage::new(sysroot)
crate::store::Storage::new(sysroot, &global_run)
}

#[context("Querying root privilege")]
Expand Down Expand Up @@ -798,6 +837,27 @@ async fn run_from_opt(opt: Opt) -> Result<()> {
ImageOpts::CopyToStorage { source, target } => {
crate::image::push_entrypoint(source.as_deref(), target.as_deref()).await
}
ImageOpts::PullFromDefaultStorage { image } => {
let sysroot = get_storage().await?;
sysroot.imgstore.pull_from_host_storage(&image).await
}
ImageOpts::Cmd(opt) => {
let sysroot = get_storage().await?;
match opt {
ImageCmdOpts::List { args } => {
crate::image::imgcmd_entrypoint(&sysroot.imgstore, "list", &args).await
}
ImageCmdOpts::Build { args } => {
crate::image::imgcmd_entrypoint(&sysroot.imgstore, "build", &args).await
}
ImageCmdOpts::Pull { args } => {
crate::image::imgcmd_entrypoint(&sysroot.imgstore, "pull", &args).await
}
ImageCmdOpts::Push { args } => {
crate::image::imgcmd_entrypoint(&sysroot.imgstore, "push", &args).await
}
}
}
},
#[cfg(feature = "install")]
Opt::Install(opts) => match opts {
Expand Down Expand Up @@ -831,6 +891,10 @@ async fn run_from_opt(opt: Opt) -> Result<()> {
serde_json::to_writer_pretty(&mut stdout, &schema)?;
Ok(())
}
InternalsOpts::Cleanup => {
let sysroot = get_storage().await?;
crate::deploy::cleanup(&sysroot).await
}
},
#[cfg(feature = "docgen")]
Opt::Man(manopts) => crate::docgen::generate_manpages(&manopts.directory),
Expand Down
104 changes: 64 additions & 40 deletions lib/src/deploy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
//!
//! Create a merged filesystem tree with the image and mounted configmaps.
use std::collections::HashSet;
use std::io::{BufRead, Write};

use anyhow::Ok;
Expand Down Expand Up @@ -268,53 +269,76 @@ pub(crate) async fn pull(
Ok(Box::new((*import).into()))
}

/// Gather all bound images in all deployments, then prune the image store,
/// using the gathered images as the roots (that will not be GC'd).
pub(crate) async fn prune_container_store(sysroot: &Storage) -> Result<()> {
let deployments = sysroot.deployments();
let mut all_bound_images = Vec::new();
for deployment in deployments {
let bound = crate::boundimage::query_bound_images_for_deployment(sysroot, &deployment)?;
all_bound_images.extend(bound.into_iter());
}
// Convert to a hashset of just the image names
let image_names = HashSet::from_iter(all_bound_images.iter().map(|img| img.image.as_str()));
let pruned = sysroot.imgstore.prune_except_roots(&image_names).await?;
tracing::debug!("Pruned images: {}", pruned.len());
Ok(())
}

pub(crate) async fn cleanup(sysroot: &Storage) -> Result<()> {
let bound_prune = prune_container_store(sysroot);

// We create clones (just atomic reference bumps) here to move to the thread.
let repo = sysroot.repo();
let sysroot = sysroot.sysroot.clone();
ostree_ext::tokio_util::spawn_blocking_cancellable_flatten(move |cancellable| {
let locked_sysroot = &SysrootLock::from_assumed_locked(&sysroot);
let cancellable = Some(cancellable);
let repo = &repo;
let txn = repo.auto_transaction(cancellable)?;
let repo = txn.repo();

// Regenerate our base references. First, we delete the ones that exist
for ref_entry in repo
.list_refs_ext(
Some(BASE_IMAGE_PREFIX),
ostree::RepoListRefsExtFlags::NONE,
cancellable,
)
.context("Listing refs")?
.keys()
{
repo.transaction_set_refspec(ref_entry, None);
}
let repo_prune =
ostree_ext::tokio_util::spawn_blocking_cancellable_flatten(move |cancellable| {
let locked_sysroot = &SysrootLock::from_assumed_locked(&sysroot);
let cancellable = Some(cancellable);
let repo = &repo;
let txn = repo.auto_transaction(cancellable)?;
let repo = txn.repo();

// Regenerate our base references. First, we delete the ones that exist
for ref_entry in repo
.list_refs_ext(
Some(BASE_IMAGE_PREFIX),
ostree::RepoListRefsExtFlags::NONE,
cancellable,
)
.context("Listing refs")?
.keys()
{
repo.transaction_set_refspec(ref_entry, None);
}

// Then, for each deployment which is derived (e.g. has configmaps) we synthesize
// a base ref to ensure that it's not GC'd.
for (i, deployment) in sysroot.deployments().into_iter().enumerate() {
let commit = deployment.csum();
if let Some(base) = get_base_commit(repo, &commit)? {
repo.transaction_set_refspec(&format!("{BASE_IMAGE_PREFIX}/{i}"), Some(&base));
}
}

// Then, for each deployment which is derived (e.g. has configmaps) we synthesize
// a base ref to ensure that it's not GC'd.
for (i, deployment) in sysroot.deployments().into_iter().enumerate() {
let commit = deployment.csum();
if let Some(base) = get_base_commit(repo, &commit)? {
repo.transaction_set_refspec(&format!("{BASE_IMAGE_PREFIX}/{i}"), Some(&base));
let pruned =
ostree_container::deploy::prune(locked_sysroot).context("Pruning images")?;
if !pruned.is_empty() {
let size = glib::format_size(pruned.objsize);
println!(
"Pruned images: {} (layers: {}, objsize: {})",
pruned.n_images, pruned.n_layers, size
);
} else {
tracing::debug!("Nothing to prune");
}
}

let pruned = ostree_container::deploy::prune(locked_sysroot).context("Pruning images")?;
if !pruned.is_empty() {
let size = glib::format_size(pruned.objsize);
println!(
"Pruned images: {} (layers: {}, objsize: {})",
pruned.n_images, pruned.n_layers, size
);
} else {
tracing::debug!("Nothing to prune");
}
Ok(())
});

Ok(())
})
.await
// We run these in parallel mostly because we can.
tokio::try_join!(repo_prune, bound_prune)?;
Ok(())
}

/// If commit is a bootc-derived commit (e.g. has configmaps), return its base.
Expand Down Expand Up @@ -399,7 +423,7 @@ pub(crate) async fn stage(
)
.await?;

crate::boundimage::pull_bound_images(sysroot, &deployment)?;
crate::boundimage::pull_bound_images(sysroot, &deployment).await?;

crate::deploy::cleanup(sysroot).await?;
println!("Queued for next boot: {:#}", spec.image);
Expand Down
Loading