Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS boot testing for all supported image types #217

Merged
merged 18 commits into from
Oct 31, 2023

Conversation

achilleas-k
Copy link
Member

@achilleas-k achilleas-k commented Oct 20, 2023

Enables boot testing on AWS for all supported image types.

The following changes were made to image definitions and tests:

  • The edge-ami image uses the kernel options that are applied to all other EC2/AMI images.
  • The command line flags for the boot-aws command were modified.
  • The ec2-sap image for RHEL 8.4 is built with the lm_sensors.service disabled (in CI) because of a known issue.

@achilleas-k achilleas-k changed the title test: AWS boot testing for all supported image types AWS boot testing for all supported image types Oct 20, 2023
supakeen
supakeen previously approved these changes Oct 20, 2023
@supakeen supakeen enabled auto-merge October 20, 2023 16:57
@supakeen
Copy link
Member

Awesome :)

@achilleas-k
Copy link
Member Author

EC2 images are created compressed (xz) and should be decompressed before pushing to boot.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but the failing EDGE-ami boot test is IMHO caused by the fact that EDGE images don't have cloud-init installed by default... 🤔 Thus the ssh key is never configured as authorized in the VM...

@achilleas-k
Copy link
Member Author

The changes look good, but the failing EDGE-ami boot test is IMHO caused by the fact that EDGE images don't have cloud-init installed by default... 🤔 Thus the ssh key is never configured as authorized in the VM...

I've been thinking about how to test these (while working on other stuff) and I guess the only way is to add a user to the image BP. I really wanted to have an empty baseline test for everything, but I guess there's no way to automate initial-setup (at least not the way we configure it?). We'll need to do the same with the installers as well anyway.

@achilleas-k achilleas-k force-pushed the boot-test/ec2 branch 2 times, most recently from 031121c to 0717f77 Compare October 24, 2023 21:20
@achilleas-k
Copy link
Member Author

@achilleas-k achilleas-k force-pushed the boot-test/ec2 branch 2 times, most recently from 5f7f868 to 624113f Compare October 25, 2023 19:40
@achilleas-k
Copy link
Member Author

achilleas-k commented Oct 25, 2023

Alright, we are booting and... failing on ARM for some reason:

● systemd-firstboot.service                                                                 loaded failed failed    First Boot Wizard

EDIT: this is good right? we're catching bugs 🐛

@cgwalters
Copy link
Contributor

FWIW in coreos and derivatives https://github.com/coreos/fedora-coreos-config/blob/6b1b56baab105ffac5cb290d5083f75e23437d2e/manifests/ignition-and-ostree.yaml#L25
(systemd-firstboot also makes little sense in a system that uses cloud-init)

@achilleas-k
Copy link
Member Author

FWIW in coreos and derivatives https://github.com/coreos/fedora-coreos-config/blob/6b1b56baab105ffac5cb290d5083f75e23437d2e/manifests/ignition-and-ostree.yaml#L25 (systemd-firstboot also makes little sense in a system that uses cloud-init)

We don't add cloud-init to the edge-ami. The image was added by the Edge team so I'm not sure if that was an oversight or deliberate. Maybe @7flying can shed some light?

It's also strange that it only fails on aarch64 (and consistently).
I haven't gotten around to checking the log for the failure yet.

@7flying
Copy link
Member

7flying commented Oct 26, 2023

FWIW in coreos and derivatives https://github.com/coreos/fedora-coreos-config/blob/6b1b56baab105ffac5cb290d5083f75e23437d2e/manifests/ignition-and-ostree.yaml#L25 (systemd-firstboot also makes little sense in a system that uses cloud-init)

We don't add cloud-init to the edge-ami. The image was added by the Edge team so I'm not sure if that was an oversight or deliberate. Maybe @7flying can shed some light?

It's also strange that it only fails on aarch64 (and consistently). I haven't gotten around to checking the log for the failure yet.

We deliberately did not add it since that would mean adding the RPM to every image using the same edge-commit and that did not make sense. Our docs recommend to add cloud-init in the blueprint of the edge-ami instead.

@achilleas-k
Copy link
Member Author

Our docs recommend to add cloud-init in the blueprint of the edge-ami instead.

But we can't add packages just to the ami. The build just deploys the commit and creates the image and doesn't support layering packages on top. Is that something we should consider?

There's also the (IMO better) alternative that I want to do eventually but might take some time: Build the image directly without pulling in an ostree commit and export both the image and the commit tarball. That way we could build an edge-ami and get out both an edge-commit.tar that's specific to the AMI and the raw image for EC2.

@achilleas-k
Copy link
Member Author

Also of course, adding it would require it being in the commit, I don't know why I thought it would be included in the ami. But for our test, we can build an ostree commit that has it to test against the edge-ami specifically.

@7flying do you know if adding cloud-init to an edge/iot-commit conflict will conflict with initial-setup?

@7flying
Copy link
Member

7flying commented Oct 27, 2023

Our docs recommend to add cloud-init in the blueprint of the edge-ami instead.

But we can't add packages just to the ami. The build just deploys the commit and creates the image and doesn't support layering packages on top. Is that something we should consider?

Good point, I meant to say, "to the commit that is just going to be used for the AMI". But I'll double check with docs to make that point clear.

Yes, ideally we would want a single edge commit to have all the packages that all the possible edge images would need, and then, depending on the image that you want to build some of them would be "disabled/removed", or we could use layers, but regardless, there shouldn't be more configuration burden for the user, if possible.

There's also the (IMO better) alternative that I want to do eventually but might take some time: Build the image directly without pulling in an ostree commit and export both the image and the commit tarball. That way we could build an edge-ami and get out both an edge-commit.tar that's specific to the AMI and the raw image for EC2.

Yes, having a single command-line spell that builds it all in one go would be the best solution.

@7flying
Copy link
Member

7flying commented Oct 27, 2023

Also of course, adding it would require it being in the commit, I don't know why I thought it would be included in the ami. But for our test, we can build an ostree commit that has it to test against the edge-ami specifically.

@7flying do you know if adding cloud-init to an edge/iot-commit conflict will conflict with initial-setup?

@achilleas-k no sorry, I did not test that

@achilleas-k
Copy link
Member Author

Also of course, adding it would require it being in the commit, I don't know why I thought it would be included in the ami. But for our test, we can build an ostree commit that has it to test against the edge-ami specifically.
@7flying do you know if adding cloud-init to an edge/iot-commit conflict will conflict with initial-setup?

@achilleas-k no sorry, I did not test that

I think I'll add this as part of our test configs: Build an edge commit with cloud-init installed and use it as the base image for an edge-ami and then boot test it with cloud-init.

But I'll leave that for a follow-up PR.
For now, here, I'll see if I can figure out what's making the firstboot service fail to start (could be nothing, but it could be a side effect of some bug) and take it from there.

Thanks for the input!

@achilleas-k
Copy link
Member Author

I modified the edge-ami image type to include all the kernel options that we add to other EC2/AMI image types.

@achilleas-k
Copy link
Member Author

@achilleas-k
Copy link
Member Author

achilleas-k commented Oct 30, 2023

One failure left: https://gitlab.com/redhat/services/products/image-builder/ci/images/-/pipelines/1055416885 (ec2-sap on RHEL 8.4).

Most likely it's an issue with just being slower to start:

$ sudo systemd-analyze
Startup finished in 2.601s (kernel) + 9.615s (initrd) + 1min 33.448s (userspace) = 1min 45.665s
multi-user.target reached after 38.479s in userspace

The keyscan, which we use to wait for the system, quits after 10 tries with 10 second intervals.

EDIT: Though it also failed to start lm_sensors.service, so the system is in any case degraded.

Enables boot testing on AWS for all supported image types.
If an image is an xz compressed file, unxz it before boot testing and
delete the uncompressed image when done to avoid pushing it to the build
cache.
jq . --sort-keys the whole file for cleaner edits.
The config isn't used by any image type currently.
The private SSH key is set in the GitLab CI variables.
Add two new configs for image types that pull an ostree commit and add a
user customization to the configuration.

Same key that was added to the user-customizations config.

The new configs are used for edge-ami so we can boot test with a
predefined user and SSH key.
Doesn't make sense for our boot testing binary.
If the CI_PRIV_SSH_KEY env var exists, use that instead of generating a
new key pair.  This is used for boot testing images that have a key
configured in the build config, for cases where user creation isn't
possible at first boot.
Update the test README to mention that:
- Images are boot tested when supported.
- Images are uploaded to the S3 bucket as well as the manifest, not just
  the build info.

A minor change is also made to both the test/README.md and the
HACKING.md about building the `cmd/build` binary without sudo and
putting it in `./bin/build` before running it with sudo.
Add instructions for how to use the boot-aws command.
In the boot test script, we don't need to use sudo to run
  systemctl is-system-running

For the edge-ami boot test, we can't set passwordless sudo right now, so
if it's not required, we can just run it as a regular user.
Environments don't yet support kernel arguments so doing this now would
make the current patch set much bigger than we want it to be.
Add a note to do it another time.
Add kernel options to the image type for edge images (raw, ami, and
vsphere) and handle them in the edgeRawImage() function.
Move the 'modprobe.blacklist=vc4' option, which was used to be set in
the image function, to the image types instead.
@achilleas-k
Copy link
Member Author

achilleas-k commented Oct 31, 2023

One failure left: https://gitlab.com/redhat/services/products/image-builder/ci/images/-/pipelines/1055416885 (ec2-sap on RHEL 8.4).

Most likely it's an issue with just being slower to start:

$ sudo systemd-analyze
Startup finished in 2.601s (kernel) + 9.615s (initrd) + 1min 33.448s (userspace) = 1min 45.665s
multi-user.target reached after 38.479s in userspace

The keyscan, which we use to wait for the system, quits after 10 tries with 10 second intervals.

EDIT: Though it also failed to start lm_sensors.service, so the system is in any case degraded.

So the timeout increase worked and the test script ran but like I saw in my local test, the lm_sensors service didn't start:
https://gitlab.com/redhat/services/products/image-builder/ci/images/-/jobs/5420205170

This is only an issue on RHEL 8.4 SAP.

EDIT: Known issue: https://issues.redhat.com/browse/RHEL-14096

- Create an empty config specifically for ec2-sap for all distros except
RHEL 8.4.
- Create a config for ec2-sap for RHEL 8.4 only that disables
  lm_sensors.service

Related: RHEL-14096
Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@supakeen supakeen added this pull request to the merge queue Oct 31, 2023
@supakeen
Copy link
Member

Absolutely love it.

Merged via the queue into osbuild:main with commit b6216aa Oct 31, 2023
7 checks passed
@achilleas-k achilleas-k deleted the boot-test/ec2 branch October 31, 2023 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants