Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multi-arch pipeline #262

Closed
cgwalters opened this issue Aug 26, 2019 · 35 comments
Closed

Enable multi-arch pipeline #262

cgwalters opened this issue Aug 26, 2019 · 35 comments

Comments

@cgwalters
Copy link
Member

See this comment:

The main reason I was working on this is so we can have alt-arch teams just spin up plain bare metal machines with jenkins workers running as non-root and connect to the FCOS Jenkins master in the normal Jenkins way.
Then if we change the pipeline to run podman in this way via plain shell instead of podTemplate, that gives us a fairly decent architecture that supports scheduling builds across architectures while ensuring consistency - (although I think we should still use the Jenkins Kubernetes plugin for other things).

Credit to @arithx for the suggestion.

@cgwalters
Copy link
Member Author

Fedora has multi-arch hardware, although I think most of it is "owned" by Koji. We'd need to either liberate some or make some machines "dual role" as a Jenkins worker too. For aarch64 there's packet.net at least.

@cgwalters
Copy link
Member Author

See this cosa issue for some previous work (and notes for further work): coreos/coreos-assembler#673

@dustymabe
Copy link
Member

@bgilbert is there a project for things to do after stable? I don't think this should block stable but we need to track it

@bgilbert
Copy link
Contributor

@dustymabe There isn't. Do we have a specific timeline in mind? If not, there's always the fact that it's in this tracker. 🙂

@dustymabe
Copy link
Member

I was mostly looking for a way to categorize things as "important, but after stable" that could separate it from the rest of the open issues.

@jlebon
Copy link
Member

jlebon commented Sep 19, 2019

OK, so there are three rough suggestions so far on ways to farm out work to arch-specific machines:

  • Run the pipeline in a multi-arch OpenShift cluster and use node selection.
  • Have machines connect to Jenkins master as regular workers.
  • Have the pipeline itself bring up machines in various clouds that support target arch and e.g. an Ignition config that starts a cosa build and pushes out to some spot in S3? (Or I guess, connects it back to the Jenkins master as a regular worker...)

Of those, I'd much prefer the first option. At least I hope we aim for the first option first before falling back on the other ones. I really don't want to go back to maintaining Jenkins workers.

@arithx
Copy link
Contributor

arithx commented Sep 19, 2019

I'd definitely agree on preferring the first option.

@cgwalters
Copy link
Member Author

One note; since today source-of-truth is S3, and S3 doesn't have a defined way to do locking or even compare-and-exchange we need application-level handling for this. In other words, one Jenkins instance "owns" builds.json. Separate workers would upload to s3 under their per-arch builddir, but not try to write builds.json. Rather the main Jenkins would get a message about the completed build and update it itself.

@cgwalters
Copy link
Member Author

Run the pipeline in a multi-arch OpenShift cluster and use node selection.

That doesn't really exist as far as I know. I don't think we can depend on making that work. Maybe something like Federation would allow us to continue using the Jenkins Kubernetes plugin?

You're missing another whole class of option, which is to have a separate pipeline per architecture, but have the alt-arch pipelines submit "merge requests" to the main x86_64 one. A bit like koji-shadow perhaps.

@jlebon
Copy link
Member

jlebon commented Nov 8, 2019

That doesn't really exist as far as I know. I don't think we can depend on making that work. Maybe something like Federation would allow us to continue using the Jenkins Kubernetes plugin?

Yeah, I think there's an obvious bootstrapping issue of sorts I ignored: trying to run on an OpenShift multi-arch cluster to hack on software which will enable OpenShift multi-arch clusters... It's not clear to me how well supported OpenShift v3 is on non-x86_64 nodes (looks like ppc64le is at least, it was hard to find information about aarch64 and s390x).

So I guess to start with we'll have to hack around this (but definitely revisit once OCP/Origin v4 is in a better position).

Fedora has multi-arch hardware, although I think most of it is "owned" by Koji. We'd need to either liberate some or make some machines "dual role" as a Jenkins worker too.

Yeah, it does seem to me like the simplest solution is to just reuse the multi-arch hardware from Koji/Brew. I filed https://pagure.io/fedora-infrastructure/issue/8370 to start a discussion around this. Feel free to add ideas there!

@andymcc
Copy link

andymcc commented Nov 11, 2019

Yeah, I think there's an obvious bootstrapping issue of sorts I ignored: trying to run on an OpenShift multi-arch cluster to hack on software which will enable OpenShift multi-arch clusters... It's not clear to me how well supported OpenShift v3 is on non-x86_64 nodes (looks like ppc64le is at least, it was hard to find information about aarch64 and s390x).

For ppc64le it would be the same as x86_64, for s390x/aarch64 we do build the containers and you can install it, but it does take some additional settings (e.g. pointing to the location of the container images and some additional repositories before running the openshift-ansible setup).

@cgwalters
Copy link
Member Author

One thing related to this is AIUI the current plan for Fedora IoT is to continue using Pungi, maybe feeding the ostree commit into osbuild.

@jlebon
Copy link
Member

jlebon commented May 11, 2020

@travier
Copy link
Member

travier commented Jun 11, 2020

A workaround here would be to run a full aarch64 cosa container on x86_64 using qemu-user-static. This would be slower from a performance perspective but not as slow as using a full system emulation. podman is currently able to run a Fedora aarch64 container image using this workaround.

@dustymabe
Copy link
Member

A few of us met recently to discuss multi-arch + FCOS. Here is a summary of that meeting:

Notes: https://hackmd.io/PQa8QhItQwCL6_3M495jJQ

Summary:

- Regarding OpenShift clusters for other architectures:
    - The CPE team does not have the bandwidth to handle new clusters even if the underlying
      infrastructure is managed by someone else (i.e. just IaaS won't suffice).
    - The CPE team would most likely need a managed openshift offering to be provided to them
      and managed for them in order for this need to be satisfied.
    - Considering that no known cloud provider offers managed openshift for non-x86-64 (maybe
      IBM has something in the works?), we don't think it's something we should count on for thOpen Questions/Action items:
- Regarding supporting multi-arch for FCOS without OpenShift for other architectures:
    - It would take some rework of the pipeline, but we think we can get by if we are provided
      with some VMs for each architecture that we can then schedule jobs onto using Jenkins
    - Mohan (from Fedora Releng) has volunteered that we could pilot out this strategy with an
      aarch64 VM to see if it's feasible and how much work would be involved.
- Action items
    - dusty/jlebon/mohan/jakub work together to POC modifying our pipeline to support this and run
      it on an aarch64 VM provided by Fedora relenge short term.

@dustymabe
Copy link
Member

First steps should be tracked by #541

@Prashanth684
Copy link

+1. interested to follow and help especially for s390x/ppc64le.

@cgwalters
Copy link
Member Author

cgwalters commented Jun 16, 2020

Per https://github.com/projectatomic/rpmdistro-gitoverlay/blob/master/doc/reworking-fedora-releng.md#blend-upstream-testing-and-downstream-testing I think Koji should be reworked to run on Kubernetes/OpenShift, then we only have one cluster.

@LorbusChris
Copy link
Contributor

Per https://github.com/projectatomic/rpmdistro-gitoverlay/blob/master/doc/reworking-fedora-releng.md#blend-upstream-testing-and-downstream-testing I think Koji should be reworked to run on Kubernetes/OpenShift, then we only have one build system.

IIUC a Koji Operator is part of what's being worked on in https://github.com/fedora-infra/mbbox (thanks for the pointer to the repo @Conan-Kudo!)

@cgwalters
Copy link
Member Author

IIUC a Koji Operator is part of what's being worked on in https://github.com/fedora-infra/mbbox (thanks for the pointer to the repo @Conan-Kudo!)

That looks like it's just talking to Koji to schedule builds. I am more talking about doing the same architecture we have with the FCOS pipeline where our workers are just pods. This would really go towards dropping mock for example.

@cgwalters
Copy link
Member Author

This all said, it probably wouldn't be really hard for us to support in the pipeline a model where we just get SSH access to a system and we run cosa via podman directly. That'd let us pretty easily spin up e.g. an aarch64 machine in AWS to run a build, wouldn't need Kubernetes or even FCOS.

@dustymabe
Copy link
Member

This all said, it probably wouldn't be really hard for us to support in the pipeline a model where we just get SSH access to a system and we run cosa via podman directly. That'd let us pretty easily spin up e.g. an aarch64 machine in AWS to run a build, wouldn't need Kubernetes or even FCOS.

👍
This is exactly the POC I'm going to be working on next.

@darkmuggle
Copy link
Contributor

@dustymabe The work I've been doing should help here a lot. Not sure what the interest is for multiarch, but the multiarch story is the one that Gangplank (formerly Entrypoint) specifically seeks to address.

@dghubble
Copy link
Member

Poseidon is publishing some arm64 AMIs to a few regions for Typhoon Kubernetes experimental ARM64 support until there are official images, which I'd be excited to see! Hopefully greater access will spawn more interest and testing.

But the images may help in a pinch for those who can't build their own and need to bridge the gap (usual caveats about not trusting AMIs from strangers).

@odidev
Copy link

odidev commented Dec 8, 2020

I am trying to build fedora CoreOs for aarch64, Following is the configuration of my machine:

Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              1
Core(s) per socket:              48
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       Cavium
Model:                           1
Model name:                  ThunderX 88XX
Stepping:                        0x1
BogoMIPS:                        200.00
NUMA node0 CPU(s):               0-47

I am following steps from the link. The cosa init step is running successfully but for cosa fetch step I am getting error as "fatal: Missing /dev/kvm".

The output of kvm-ok is as following:
INFO: /dev/kvm exists
KVM acceleration can be used

Can you please provide me any pointers on resolving this issue? Can we skip using virtualization device and build image?

Thanks.

@dustymabe
Copy link
Member

Are you sure --device /dev/kvm is part of your podman command that gets run? What does cosa shell ls -l /dev/kvm show?

@NickCao
Copy link

NickCao commented Dec 9, 2020

Mostly likely due to insufficient privileges, try chmod 777 on /dev/kvm (for a quick test only). If you do want to disable kvm you can add environment variable COSA_NO_KVM=1 to the podman command, however the performance will be unacceptable and even hang forever on specific platforms.

@jcajka
Copy link
Contributor

jcajka commented Dec 9, 2020

@odidev I would look in the direction of what @dustymabe and @NickCao mentioned. I can say that I have no issue building the FCOS via cosa(in the same way as mentioned in the link) on the old x-gene, rk3399(rockpro64, small cores need to be off-lined for KVM/qemu to work) or rpi4 all running various Fedora versions(32-rawhide).
Out of curiosity(I don't think that it will help us to diagnose the issue) what is the distribution and container runtime that you are using?

@odidev
Copy link

odidev commented Dec 9, 2020

@NickCao thanks for your suggestion, after giving sufficient privilege, I was not facing the issue with the KVM.

@jcajka I built coreos-assembler image using Dockerfile. I was able to run all the steps mentioned in the link. And the container was running successfully. Following is the output of "uname -m":
Linux cosa-devsh 5.9.12-200.fc33.aarch64 #1 SMP Wed Dec 2 15:03:00 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

@cgwalters
Copy link
Member Author

I am sure I mentioned this before but am not seeing it now. AIUI Fedora also runs an OSBS instance; conceptually our workload could run in the same OpenShift instance I believe.

@baude
Copy link
Contributor

baude commented Mar 10, 2021

are there aarch64 builds today? if so, can someone steer me to them?

@buckaroogeek
Copy link

buckaroogeek commented Mar 10, 2021 via email

@nullr0ute
Copy link

are there aarch64 builds today? if so, can someone steer me to them?

Yes, there are. They're used for flatpak/container pipeline builds.

@dustymabe
Copy link
Member

Update. We have an aarch64 node that we then use gangplank to connect to and execute builds (still working with Fedora infra to possibly get an AARCH64 node within that DC).

We are now bumping the lockfiles for aarch64 as a result of the following PRs:

Next step is to enable an aarch64 pipeline that's triggered as part of the normal pipeline runs.

@dustymabe
Copy link
Member

Since we are shipping aarch64 artifacts now we can mark this one as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests