-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ostree=aboot for signed Android Boot Images #2844
Conversation
Hi @ericcurtin. Thanks for your PR. I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Related to conversations in #2753 ... Boots fine
Although I want to do some more testing with some greenboot changes. |
bdc955c
to
eea13f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this!
d1b620e
to
1d845c7
Compare
More info of what a system booted like this looks like below. Just to point out, aboot bootloaders do not parse BLS entries at all. But we can still use that data, that might be useful, we can just parse BLS in the initramfs to decide exactly where we need to boot from, once we've figured which is the right BLS entry file to use. BLS cmdline does not match the /proc/cmdline when booted in this way, but I think that's ok. A person debugging this should know aboot does not parse BLS, it's a pretty simple booloader it boots the boot image in slot A or slot B, but the data there can still help keep the aboot technique consistent with a BLS bootloader like grub.
|
So the current logic simply picks the latest, as defined by modification timestamp, file from I think that will lead to unexpected results at best, and unexpected vulnerabilities at worst, and I guess that's why this is a Draft currently. Basing the decision on the modified timestamp could have the very slim chance of actually booting an older, but later modified, deployment, I guess that situation would be rare. It would also mean, that you can boot a (signed) kernel with a different checkout than it was deployed with, which could open up new vulnerabilities, e.g. new deployment relies on kernel feature / initramfs changes that aren't present. But the Rollbacks / multiple deployments, seem to be impossible with this logic, since even with an older image the latest checkout is used. In any case you're putting your trust in nobody tampering with the actual filesystem, which is quite a far shot by today's standards ... |
src/switchroot/ostree-mount-util.h
Outdated
static inline int | ||
append_latest_link(char* dir_str, int* len, int* cap) { | ||
char* append_from = dir_str + *len; | ||
int appended_len = 0; | ||
DIR __attribute__ ((cleanup(close_dir))) *dir = opendir(dir_str); | ||
if (!dir) | ||
{ | ||
fprintf(stderr, "'%s' %d", strerror(errno), __LINE__); | ||
return 1; | ||
} | ||
|
||
struct dirent *ent = 0; | ||
for (time_t latest = 0; (ent = readdir(dir)); ) | ||
{ | ||
if (ent->d_type != DT_LNK) | ||
continue; | ||
|
||
struct stat lsb; | ||
lsb.st_mtime = 0; | ||
char ent_str[PATH_MAX + sizeof (ent->d_name)]; | ||
snprintf(ent_str, sizeof (ent_str), "%s/%s", dir_str, ent->d_name); | ||
if (lstat(ent_str, &lsb) == -1) | ||
{ | ||
fprintf(stderr, "'%s' %d", strerror(errno), __LINE__); | ||
return 1; | ||
} | ||
|
||
if (latest <= lsb.st_mtime) | ||
{ | ||
appended_len = snprintf(append_from, *cap, "/%s", ent->d_name); | ||
latest = lsb.st_mtime; | ||
} | ||
} | ||
|
||
*len += appended_len; | ||
*cap = PATH_MAX - *len; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as I take it, this method searches $prefix/ostree/
and appends the latest symlink.
Maybe it would be good to only look at $prefix/ostree/boot*
(perhaps even limited to all possible combinations) ?
Otherwise creating a link in there (even if you shouldn't do so) seems to have potential to break any attempt to boot.
Sorting by modification timestamps seems a bit fragile to me, can't that be done by simply looking at the number in the link ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the current logic simply picks the latest, as defined by modification timestamp, file from
/sysroot/ostree/boot.$x/$stateroot/$bootcsum/$checkoutnum
as checkout to use ?
Yes.
I think that will lead to unexpected results at best, and unexpected vulnerabilities at worst, and I guess that's why this is a Draft currently.
Yes again, I'm just trying things at the moment.
Basing the decision on the modified timestamp could have the very slim chance of actually booting an older, but later modified, deployment, I guess that situation would be rare.
Yes I am worried about this too, expect the implementation to change.
It would also mean, that you can boot a (signed) kernel with a different checkout than it was deployed with, which could open up new vulnerabilities, e.g. new deployment relies on kernel feature / initramfs changes that aren't present.
But the
$bootcsum
should be computable by the loader (or maybe even the kernel/initramfs itself), so while you can't embed the hash inside the images (which is the data being hashed), perhaps it could be passed by the UKI-stub or loader. Then you wouldn't need to "guess" it, and can ensure that a deployment would only ever be booted by a image it was build with. (Side-Note: That could even work if the bootcsum ends up being the hash of the UKI, but in that case it could be preferable to use the PECOFF hash that would be signed).
I am using a very limited bootloader that boots either an A or B boot partition, that contains an Android Boot Image. But there is very limited data available in the GPT partition table that I can be able to use in order to identify if I should use the latest or the latest-1 version. If that makes sense.
UKI's probably have much more options.
Rollbacks / multiple deployments, seem to be impossible with this logic, since even with an older image the latest checkout is used. I think the information of which deployment to use comes inherently from an unverifiable source and the best you can do is probably to limit the image+deployment combinations to ones where the image booted and the image of the deployment match (bootcsum). And I believe that is true even if you implement automatic health checks, since at some point you need to trust something, and realistically you want to give users the option of choosing to boot rollback image+deployment (e.g. by means of UEFI boot selection / sd-boot).
I only want Android style A/B rollbacks, nothing else.
In any case you're putting your trust in nobody tampering with the actual filesystem, which is quite a far shot by today's standards ...
Will need to be hardened.
I think you can take this PR with a pinch of salt at the minute, it will be drastically changed. But thanks for the early feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using a very limited bootloader that boots either an A or B boot partition, that contains an Android Boot Image. But there is very limited data available in the GPT partition table that I can be able to use in order to identify if I should use the latest or the latest-1 version. If that makes sense.
Ah, that makes things a bit easier 🤔
I think it might end up being a very similar trade-off like you have to do with UKI, since the information on what to boot (A or B) comes from an untrusted / unverified source.
In any case you're putting your trust in nobody tampering with the actual filesystem, which is quite a far shot by today's standards ...
Will need to be hardened.
It was more of a general comment, a lot of things currently depend on the filesystem not having been tampered with.
I think you can take this PR with a pinch of salt at the minute, it will be drastically changed. But thanks for the early feedback.
It is very nice to see some attempts at implementing it, even if it wouldn't fit my use case in the end 🚀
As always, an actual (incomplete) implementation is much easier to argue about than some abstract idea.
@cgwalters @jlebon @bauen1 @alexlarsson Just getting back to this, how would you guys feel about writing a very simple bls parser to achieve this, so we can behave like a bls-based bootloader but only in the initramfs rather than the bootloader? It would just do a rpmvercmp and then pull the "^options " line to get the ostree= arg from there... I probably wouldn't reuse |
The code seems tractable...it's the maintenance and testing that seems like the hard part here. Having two ways to do it permanently seems like a problem. |
I'll push what I have tomorrow and you can take a look, it's working for the upgrade path, not for the fallback path yet, I'd love to actually re-use stuff and not rewrite things to minimise the initramfs. But if there's an industry that's pushing hard for small initramfs's and fast boot times it's automotive 😅 |
b38f5f9
to
ffadf85
Compare
ffadf85
to
9ed1711
Compare
Ok pushed tonight instead had twenty minutes to spare |
d31f2d6
to
61353f3
Compare
61353f3
to
21623c7
Compare
Yup that flipping the bit is the switch. Although now we have two switches, because we use BLS also.
Yeah it's gonna be similar to this.
This sounds like a good idea, it does mean we would have to write to the Android Boot Partitions everytime, even for userspace upgrades, which is actually ok, not a big deal. Lets keep this in mind if we need it. It could simplify things quite a bit. |
Btw @cgwalters did you see this question. Which is the correct way to do a vercmp check in ostree, via filename or "version " field or are they the same? |
607ac1b
to
5ec1ae9
Compare
@@ -124,9 +124,13 @@ resolve_deploy_path (const char * root_mountpoint) | |||
{ | |||
char destpath[PATH_MAX]; | |||
struct stat stbuf; | |||
char *ostree_target, *deploy_path; | |||
char __attribute__ ((cleanup (free_char))) *ostree_target; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this was leaking here and in src/switchroot/ostree-system-generator.c all along, but both are short lived processes so you wouldn't really notice.
5ec1ae9
to
1a2bba0
Compare
Some kernel images are delivered in a signed kernel + cmdline + initramfs + dtb blob. When this is added to the commit server side, only after this do you know what the cmdline is, this creates a recursion issue. To avoid this, in the case where we have ostree=aboot karg set, do the bls parsing in the initramfs instead, so we can take advantage of existing bls logic.
1a2bba0
to
03aa8f7
Compare
Added a uname check |
/ok-to-test |
I'm not sure I follow this. If the initrd knows from which partition it booted from, can't it pick the latest BLS in its partition? Or maybe that's what you're saying but in different words. It feels confusing to have both multiple boot partitions and multiple BLS entries per partition. IIUC, I think ideally we'd have ostree just emit a single BLS entry instead in that case, but that's clearly a more invasive patch. It seems to me though like we should just pretend the older BLS entries don't exist (this rules out any potential mismatching issues). And then, rather than reimplementing a BLS parser, we could have some bootloader option that writes out the deployment root path in some file (or maybe as a symlink) under the |
This is a possible technique, we do not have A/B system partitions (because it doesn't make as much sense when it's managed by ostree that can manage multiple versions). But both Android Boot Images (the are like UKIs) can be from booted from in either boot partition A or B.
We plan on having one set for BLS entries for both partitions.
Could do, but BLS is easy to parse, it's a nice simple format tbh, even if we started using a separate file, BLS file format would make sense as it's easy to parse in C. And the thing is, there is useful data there whether we create a separate file or not. |
Well, the right way to answer this question is to look at the consumers. For e.g. zipl, it's which is parsing the Reading the code for grub2 in https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/0021-blscfg-add-blscfg-module-to-parse-Boot-Loader-Specif.patch it looks like it's sorting by filename... |
Ack thanks, this helps a lot. So IIUC, the A/B boot partitions are purely for the UKI-like blobs, but otherwise are separate from So then my next question is re. this bit:
There's two critical events: the symlink update in the bootfs for pointing at the new set of BLS entries, and the slot switch. If we get interrupted after the symlink update but before the slot switch, isn't the system state inconsistent? I.e. won't the active slot (containing the booted kernel) try to boot the now latest deployment (containing the new userspace)? Sorry if this is obvious to everyone else. I'm just trying to understand how this all fits since we're bound to get more UKI-related things in the future. |
Yeah. The boot_a and boot_b partitions are "raw" data partitions with just the UKI blob. They have no filesystem on them at all. Typically in a aboot system, boot_a would be paired with a read-only dm-verity:ed userspace fs image in the system_a partition and boot_b would be paired with system_b partition. However, since our system partition is writable with the ostree repo on it we will only have one system partition.
Yeah, this is true. This is why I proposed adding a "randomly generated" build identifier to the initrd, which would be paired with the BLS file. Then whichever initrd gets booted it can finds its own correct BLS file in either of the boot directories. |
Going along with the idea of a "random generated" build identifier, we could actually just make this a checksum of the diff before we commit and store it as a non-random thing (almost like generating a commit, but not actually committing it), that way the id is reproducible in builds and it's more verifiable it's correct. We could use something like sha1sum, we want a checksumming algorithm that's really fast. This way we might be able to afford to get away with no fallback flag written by the firmware into the GPT partition table because you would know from this build identifier to keep rolling back commits until the build id matches. I'm setting up an environment at the moment to test all different kinds of fake upgrades etc. The above works with an "rpm-ostree install" kind of upgrade. But I want to try a normal userspace upgrade, a kernel upgrade etc. So whatever technique we use I can test it covers all bases. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I think I have an idea on how to resolve the race condition here more systematically. The approach is essentially to never change the position of the booted deployment during an update. Let's say you're booted in the default deployment and you're ready to upgrade. Then you would:
If we're interrupted in between the two steps, we'll just boot right back in the old deployment. As soon as we do the slot switch, we're set to boot into the new deployment. In the next update, you'd do the opposite (deploy to the pending deployment position). Essentially, this makes the deployment index not really matter. 0 is always matched to slot A and 1 is always matched to slot B. Obviously if you do something like |
I am heading on a short break for a few days, but I will be back next week, just to give an update. I am working with the upstream maintainer of this tool: https://gitlab.com/sdm845-mainline/qbootctl/-/merge_requests to do slot switching on a real Qualcomm based solution, up to now I'd been testing in single slot mode. Expect there to be another re-spin of this, thanks for the feedback so far. |
This one could probably benefit from a realtime conversation. I'm looking at time slots for this now, tentatively thinking May 31 @ 9:30am EST at https://meet.gnome.org/col-hab-hej-e5i |
I am out next week 29 May-2 June, I spoke to @alexlarsson and we have something in mind that should require no changes to ostree-prepare-root, thanks to the design of composefs, it really simplified things. I should be able to complete in aboot-deploy abstraction requiring minimal to no further changes to ostree. I'm gonna close this PR, but would be more than happy to have the conversation to keep everyone sync'd up. |
Some kernel images are delivered in a signed kernel + cmdline +
initramfs + dtb blob. When this is added to the commit server side, only
after this do you know what the cmdline is, this creates a recursion
issue. To avoid this, in the case where we have ostree=aboot karg
set, do the bls parsing in the initramfs instead, so we can take
advantage of existing bls logic.