Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arch Linux test runner #1438

Open
klausenbusk opened this issue Jan 21, 2024 · 18 comments
Open

Arch Linux test runner #1438

klausenbusk opened this issue Jan 21, 2024 · 18 comments
Labels
1. Bug Something isn't working

Comments

@klausenbusk
Copy link

klausenbusk commented Jan 21, 2024

Hi

It has been brought to my attention that you would like some test runners for Arch Linux and Debian.

We (Arch Linux) might be able to help and as part of Arch Linux's DevOps team, I would like some more details on what exactly you need.

How powerful must the runner be? Would a Hetzner CX11 VM be sufficient and what kind of software must the VM run? Zuul?

@klausenbusk klausenbusk added the 1. Bug Something isn't working label Jan 21, 2024
@debarshiray
Copy link
Member

First of all, let me thank you deeply for responding to the call for help! It's much appreciated.

I suspect that we don't need a particularly powerful runner.

We are currently using an instance of Zuul CI called Software Factory for Fedora and CentOS Stream 9 runners, and GitHub Actions workflow for a Ubuntu 22.04 runner. Software Factory has CI runners (or nodes) with different hardware capabilities. We are using the ones called cloud-fedora-rawhide, cloud-fedora-39, etc., but I don't know their exact configuration off hand.

I suppose we have two ways to add Arch Linux to the Toolbx CI. We can either try to add Arch Linux images to Software Factory, which, I think, will then run on Software Factory's CI runners (or nodes), or we can try to add a runner for GitHub Actions workflows. Here are the existing image definitions known to Software Factory.

I am not an expert in either, so I am happy to leave it to your tastes and preferences. @danpawlik and @TristanCacqueray are my contacts for questions about Zuul CI and Software Factory. I hope they will correct me if I said anything wrong above.

@TristanCacqueray
Copy link
Contributor

The Zuul CI system relies on cloud providers (like OpenStack/IBMCloud/AWS) to create ephemeral instances per build, so a dedicated VM would not be sufficient. We could either add a new cloud provider, or perhaps we have enough capacity to handle the new jobs. In any-case, we either need an existing image-id/ami, or a recipe to build the image that will be uploaded to the cloud prior to running job. As you can see in the link shared by @debarshiray , the image definition uses an existing qcow to perform minor modification for CI purpose.

Note that Zuul also support cloud native providers (like Kubernetes), but that's not practical to run toolbox test (which would need nested containerization).

@debarshiray
Copy link
Member

/cc @Foxboron

@debarshiray
Copy link
Member

Ping @klausenbusk

@klausenbusk
Copy link
Author

Sorry for not getting back to you @debarshiray. Providing access to our cloud provider (Hetzner), so you can create ephemeral instances, is not a option due to cost, labor and security concerns. It is vastly different from providing a few static runners :)

So the only realistic option is adding a Arch Linux image to existing setup. We are already building "cloud images" and they can be downloaded from the mirrors (e.g. https://geo.mirror.pkgbuild.com/images/latest/).

Do you have the manpower to add the image to Software Factory? It is not exactly the same task as providing a server or two and I'm not sure how much work it entails.

@debarshiray
Copy link
Member

Sorry for not getting back to you @debarshiray. Providing access to our cloud provider (Hetzner), so you can create ephemeral instances, is not a option due to cost, labor and security concerns. It is vastly different from providing a few static runners :)

Ok, understood. :)

So the only realistic option is adding a Arch Linux image to existing setup. We are already building "cloud images" and they can be downloaded from the mirrors (e.g. https://geo.mirror.pkgbuild.com/images/latest/).

These look useful to me. @danpawlik @TristanCacqueray what do you think?

Do you have the manpower to add the image to Software Factory? It is not exactly the same task as providing a server or two and I'm not sure how much work it entails.

I am not an expert in Zuul, but I can definitely make some time for this since I am the one who needs Arch Linux hosts for testing. I will probably end up asking a lot of questions as I find my way forward.

@debarshiray
Copy link
Member

Ping @danpawlik @TristanCacqueray

@danpawlik
Copy link
Contributor

@klausenbusk @debarshiray hi, I will talk with my team. So far, I don't see any issue to add Arch qcow2 image to our CI.

@danpawlik
Copy link
Contributor

Added an image + label: https://softwarefactory-project.io/r/c/config/+/32385 . It needs to be merged to be available later.
Let's see what other will say for that idea. 🤞

@danpawlik
Copy link
Contributor

So the change has been merged. One more is required, but you can test the arch label now.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 5, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 5, 2024
@debarshiray
Copy link
Member

Thanks, @danpawlik I am working on adding Arch Linux to our CI at #1588

The first thing that I noticed is that the label for Arch Linux is arch-linux without the cloud- prefix that we have for CentOS Stream and Fedora. Did I read that correctly?

debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 6, 2024
@debarshiray
Copy link
Member

Thanks, @danpawlik I am working on adding Arch Linux to our CI at #1588

The first thing that I noticed is that the label for Arch Linux is arch-linux without the cloud- prefix that we have for CentOS Stream and Fedora. Did I read that correctly?

It looks like I am doing something wrong. The Arch Linux hosts are never actually running the tests and I end up with RETRY_LIMIT.

@danpawlik
Copy link
Contributor

I will take a look when I have few min

@danpawlik
Copy link
Contributor

danpawlik commented Nov 7, 2024

Ups, I have done a mistake. After merging this and this nodes should spawn normally.

@debarshiray
Copy link
Member

Thanks for taking a look, @danpawlik

I see that you already restarted the CI in #1588 but the Arch Linux job is again hitting RETRY_LIMIT. Maybe I should wait another hour or so before trying so that all the changes have had time to propagate?

@danpawlik
Copy link
Contributor

@debarshiray current RETRY_LIMIT is not related to the image. If you go to the logs, you will see that Zuul spawned VM with arch image correctly, synchronize repositories, then execute playbook.
On the end, it got an error:

2024-11-07 11:13:54.861599 | 
2024-11-07 11:13:54.861814 | TASK [Check versions of crucial packages]
2024-11-07 11:13:55.176020 | arch | error: package '*kernel*' was not found
2024-11-07 11:13:55.176140 | arch | error: package '*glibc*' was not found
2024-11-07 11:13:55.176181 | arch | error: package 'shadow-utils-subid-devel' was not found
2024-11-07 11:13:55.176208 | arch | error: package 'golang' was not found
2024-11-07 11:13:55.176805 | arch | error: package 'golang-github-cpuguy83-md2man' was not found
2024-11-07 11:13:55.176832 | arch | error: package 'containernetworking-plugins' was not found
2024-11-07 11:13:55.176845 | arch | error: package 'container-selinux' was not found
2024-11-07 11:13:55.176868 | arch | bash 5.2.037-1
2024-11-07 11:13:55.178110 | arch | bats 1.11.0-2
2024-11-07 11:13:55.178132 | arch | codespell 2.3.0-2
2024-11-07 11:13:55.178144 | arch | gcc 14.2.1+r134+gab884fffe3fc-1
2024-11-07 11:13:55.178156 | arch | shellcheck 0.10.0-18
2024-11-07 11:13:55.178168 | arch | podman 5.2.5-1
2024-11-07 11:13:55.178351 | arch | conmon 1:2.1.12-1
2024-11-07 11:13:55.178369 | arch | containers-common 1:0.60.4-2
2024-11-07 11:13:55.178376 | arch | crun 1.18.2-1
2024-11-07 11:13:55.178383 | arch | skopeo 1.16.1-1
2024-11-07 11:13:55.418723 | arch | ERROR
2024-11-07 11:13:55.418897 | arch | {
2024-11-07 11:13:55.418937 | arch |   "delta": "0:00:00.038660",
2024-11-07 11:13:55.418958 | arch |   "end": "2024-11-07 11:13:55.178862",
2024-11-07 11:13:55.418974 | arch |   "msg": "non-zero return code",
2024-11-07 11:13:55.418989 | arch |   "rc": 1,
2024-11-07 11:13:55.419044 | arch |   "start": "2024-11-07 11:13:55.140202"
2024-11-07 11:13:55.419079 | arch | }

In that case, if you are doing that in "base job", it can be also set as RETRY_LIMIT, but nothing related to the image.

@debarshiray
Copy link
Member

@debarshiray current RETRY_LIMIT is not related to the image. If you go to the logs, you will see that Zuul spawned VM with arch image correctly, synchronize repositories, then execute playbook. On the end, it got an error:

Oops! You are right. My bad.

I wrote too soon. I should have actually checked if there are logs or not.

@danpawlik
Copy link
Contributor

No problem. Let me know how the image is working :)

debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
The VERSION_ID field in os-release(5) is optional [1].  It's absent on
Arch Linux, which follows a rolling-release model and uses the BUILD_ID
field instead:
  BUILD_ID=rolling

A subsequent commit will run the CI on Arch Linux.  Hence, the code to
get the default release from the host operating system can no longer
assume the presence of the VERSION_ID field in os-release(5).

Note that the arch-toolbox image is tagged with 'latest', in accordance
with OCI conventions, not 'rolling' [2,3], which is the os-release(5)
BUILD_ID.

A similar change was made to toolbox(1) in commits 2ee82af and
d14fd7b.

[1] https://www.freedesktop.org/software/systemd/man/os-release.html

[2] Commit 2568528
    containers@2568528cb7f52663
    containers#861

[3] Commit a4e5861
    containers@a4e5861ae5c93625
    containers#1308

containers#1438
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 12, 2024
debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Nov 14, 2024
The VERSION_ID field in os-release(5) is optional [1].  It's absent on
Arch Linux, which follows a rolling-release model and uses the BUILD_ID
field instead:
  BUILD_ID=rolling

A subsequent commit will run the CI on Arch Linux.  Hence, the code to
get the default release from the host operating system can no longer
assume the presence of the VERSION_ID field in os-release(5).

Note that the arch-toolbox image is tagged with 'latest', in accordance
with OCI conventions, not 'rolling' [2,3], which is the os-release(5)
BUILD_ID.

A similar change was made to toolbox(1) in commits 2ee82af and
d14fd7b.

[1] https://www.freedesktop.org/software/systemd/man/os-release.html

[2] Commit 2568528
    containers@2568528cb7f52663
    containers#861

[3] Commit a4e5861
    containers@a4e5861ae5c93625
    containers#1308

containers#1438
debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Nov 15, 2024
The VERSION_ID field in os-release(5) is optional [1].  It's absent on
Arch Linux, which follows a rolling-release model and uses the BUILD_ID
field instead:
  BUILD_ID=rolling

A subsequent commit will run the CI on Arch Linux.  Hence, the code to
get the default release from the host operating system can no longer
assume the presence of the VERSION_ID field in os-release(5).

Note that the arch-toolbox image is tagged with 'latest', in accordance
with OCI conventions, not 'rolling' [2,3], which is the os-release(5)
BUILD_ID.

A similar change was made to toolbox(1) in commits 2ee82af and
d14fd7b.

[1] https://www.freedesktop.org/software/systemd/man/os-release.html

[2] Commit 2568528
    containers@2568528cb7f52663
    containers#861

[3] Commit a4e5861
    containers@a4e5861ae5c93625
    containers#1308

containers#1438
containers#1535
debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Nov 15, 2024
This is a step towards running the CI on Arch Linux.

containers#1438
containers#1535
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 15, 2024
This is a step towards running the CI on Arch Linux.

containers#1438
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 15, 2024
debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Nov 17, 2024
The VERSION_ID field in os-release(5) is optional [1].  It's absent on
Arch Linux, which follows a rolling-release model and uses the BUILD_ID
field instead:
  BUILD_ID=rolling

A subsequent commit will run the CI on Arch Linux.  Hence, the code to
get the default release from the host operating system can no longer
assume the presence of the VERSION_ID field in os-release(5).

Note that the arch-toolbox image is tagged with 'latest', in accordance
with OCI conventions, not 'rolling' [2,3], which is the os-release(5)
BUILD_ID.

A similar change was made to toolbox(1) in commits 2ee82af and
d14fd7b.

[1] https://www.freedesktop.org/software/systemd/man/os-release.html

[2] Commit 2568528
    containers@2568528cb7f52663
    containers#861

[3] Commit a4e5861
    containers@a4e5861ae5c93625
    containers#1308

containers#1438
containers#1535
debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Nov 17, 2024
This is a step towards running the CI on Arch Linux.

containers#1438
containers#1535
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 17, 2024
This is a step towards running the CI on Arch Linux.

containers#1438
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 17, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 17, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 18, 2024
Some Arch Linux hosts have /etc/resolv.conf as an absolute symbolic link
to /run/systemd/resolve/stub-resolv.conf, instead of being a relative
symbolic link to ../run/systemd/resolve/stub-resolv.conf or a regular
file.  eg., the images built by arch-boxes [1].

This changes the target that the Toolbx container's /etc/resolv.conf
points at and confuses the tests [2].

Ideally, these host operating systems should be fixed to use relative
symbolic links.  This is highlighted by skipping the tests, because
there's no point in failing them until that happens.

This is a step towards running the CI on Arch Linux.

[1] https://gitlab.archlinux.org/archlinux/arch-boxes
    https://geo.mirror.pkgbuild.com/images/latest/

[2] Commit 88a95b0
    containers@88a95b07af335be2
    containers#187

containers#1438
debarshiray added a commit to debarshiray/toolbox that referenced this issue Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants