-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future of Power CI under P10/PowerVM #2473
Comments
Thank you for writing this up @ravanelli.
Ouch.. That really breaks our existing model and will force us to carry quite the delta just to add that architecture. |
Here it is a summary of the discussion we had with Renata on this topic. If I got something wrong, please, let me know, as I am not knowledgeable on COSA/FCOS/RHCOS. The CI infrastructure controller you have today run on an x86 environment. At some point in the process, this controller will contact a Power server to actually build the Power images and run basic build verification tests on them. There are two requirements on the Power server so that it can seamlessly integrate with your infrastructure:
In order to fulfill these requirements, you will have to run your build process on a POWER9 bare metal machine. You will need to find one that is available with a public IP address. Given that is available, you should have no issues in running the build process on that machine and spawning VMs with the built image to do your basic verification of the build process. Availability of a Power10 system with KVM support should not be an impediment here. The build process usually targets old processor versions because of compatibility and support reasons. Just as an example, IIRC, RHEL 8 is built targeting POWER8 processors as it needs to run on both POWER8 and Power9 processors. So, for the foreseeable future, using a Power9 bare metal machine to build the FCOS image and test the build process with KVM should be enough. This environment will be supported for many years to come yet. Please, let me know in case you have any additional questions on this. |
Thanks @laggarcia for all the discussion related to this topic. Right now, we don't have any bare metal Power server around with public ip access, to allow us to continue with the FCOS improvements for Power. Unless we can find it, there is no other option but to wait. |
@laggarcia my understanding has been that FCOS CI/pipeline requires openstack/aws/ocp(nowadays it should be just the first two) like cloud infra and is not really able to work with stable VMs/hosts. @dustymabe please correct me if I'm wrong. |
@jcajka How reliable is the support for the Brno University? I tried to use the minicloud in Unicamp, but lack of support is really an issue there. I had to wait more than a month to get a firmware update. |
You can also get an openstack environment from OSU: https://osuosl.org/services/powerdev/request_hosting/. I've only ever requested standalone VMs, but have had very good stability and support from them. Not suggesting over Brno, but if we need another option that's one to consider. I believe this project falls under the "Free and Open Source" restriction. |
We can work with a single bare metal machine and talk to it over SSH. That's what we're doing currently for |
@dustymabe cool, good to know. I still assumed that it is in aws was essential for various reasons, mostly redeployment, etc. |
Can we pick this conversation back up? We're getting a couple of new ping from customers about OKD. |
So I have built the Fedora CoreOS images for ppc64le using a Power10 Rainier using firmware 1060.10 with Fedora 40 using kernel version 6.10.7-200.fc40.ppc64le. KVM has been enabled from the HMC and the kvm_hv module is loaded. I thought this issue would be the best place to update my status about this effort but I am available on slack to discuss next steps if that's easier. I followed the instructions from the docs… Ran build.sh This machine is using Legacy Compatibility interrupt mode which is referred to as XICS in QEMU. As such, the following warning happens when running the tests:
Currently KVM on LPAR doesnt support native XIVE, so qemu doesnt have kernel-irq support which means the KVM interrupt controller is turned off. Suggested flags would be to use something like
I ran the tests like this
however for the past couple weeks i have not been able to get a complete test run. The tests stall and im not sure how to further debug this. In my build dir, the ./tmp/kola/reports dir is empty and in test.tap I see:
Is there another output dir where the tests would be stored? Additionally, Oregon State University Open Source Lab (OSU OSL) does have Power10 machines available that will have kvm enabled. I’m hoping to replicate this setup at OSU on an LPAR and this could provide us with a p10 kvm setup without a vpn to run tests long term. More info |
Thank you for working on this!
Yeah, we've seen that warning for a while now and haven't dug into it. Feel free to submit a patch to choose the right set of arguments based on $factors.
Which tests stall? You should see log files under e.g. |
Tests have been consistently passing on my Power10 machine with the following command: |
@mtarsel Where are we on the reprovisioning tests? Have you been able to get those to pass now? |
I now have a p10 kvm enabled box at OSU OSL for development and yesterday all tests passed consistently. I'm not sure what changed on this new box. The previous box might not have had enough disk space is my only guess at this point. |
Sweet, that's great to hear! |
Yes I think we can close this issue since there exists a P10 environment where all tests are passing. I'm still investigating 2cf91c9 however I dont think that is directly related to the future of CI with p10. |
I'm creating this issue for us to have a common place to discuss the next steps for Power CI. So, we can get more insights from multiarch folks around, and decide the best way to more forward.
With gangplank we are improving our CI to create a more multi-arch world for FCOS/RHCOS/Cosa, and also to resolve eliminate some issues as duplicated CIs around. The arm64 was successfully added, and now we are looking for Power and s390x to be part of this beautiful world.
Unfortunately, there are some strugglers with Power looking for the future. As we know P10 dropped baremetal support (PowerVM only) as RHEL9 also dropped support for kvm on Power.
Our entire ci is based on qemu/kvm. It will be really hard to change it to accommodate only Power.
Recently, I was trying to enable gangplank remote in Power, using a server provided for IBM in IBM cloud. Nonetheless, this server is a P9 using PowerVM, and here is where we can start to feel the issues working with PowerVM/kvm.
I reached to folks in IBM to understand better the options we here, and the feedback I got so far is:
kvm_pr has not been supported for a long time, and Red Hat removed it from the tree from RHEL a few months ago (should be available on RHEL 8.4). There's no upstream support neither.
As for TCG, pseries+tcg works on PowerVM without problems. The problem is that it is considerably slower than pseries+kvm. Not officially supported by IBM/Red Hat, only upstream support is provided.
I also able to build fcos with a couple of TGC warning . However,
--basic-qemu-scenarios
was kept running for more than 1 hour with no results back.kvm_hv never ran on PowerVM. Maybe... could be plans to make this happen, but it depends on the roadmap for PowerVM.
Looking for these scenarios looks we are not really able to run kvm under a PowerVm.
More details:
https://bugzilla.redhat.com/show_bug.cgi?id=2008271
The text was updated successfully, but these errors were encountered: