Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3D on Arm64 server Verification #984

Closed
4 tasks done
zhlhahaha opened this issue Mar 23, 2023 · 23 comments
Closed
4 tasks done

K3D on Arm64 server Verification #984

zhlhahaha opened this issue Mar 23, 2023 · 23 comments

Comments

@zhlhahaha
Copy link
Contributor

zhlhahaha commented Mar 23, 2023

  • k3d on bare metal server
  • kubeVirt on k3d cluster
  • k3d in bootstrap image
  • kubeVirt and k3d in bootstrap image
@zhlhahaha
Copy link
Contributor Author

Hi @oshoval , I put the tasks here, I will do a basic verification first.
As both k3s and k3d are not verified on Arm64 server, I am not sure how many works need to do. Personally, I want to finish KubeVirt feature verfication and e2e tests enablement on Arm first, then I can put more effort on this.

@oshoval
Copy link
Contributor

oshoval commented Mar 23, 2023

Hi @zhlhahaha
Sure no rush

Thanks

@oshoval
Copy link
Contributor

oshoval commented Mar 29, 2023

You know there is k3d-1.25-sriov right ?
that you can use as baseline with export DEPLOY_SRIOV=false
just making sure so you won't need to duplicate work

@zhlhahaha
Copy link
Contributor Author

You know there is k3d-1.25-sriov right ? that you can use as baseline with export DEPLOY_SRIOV=false just making sure so you won't need to duplicate work

Yes, I give it a try yesterday, and it failed to start on Arm64 server because of the cni. I am not sure why it not works. I will take a look tomorrow.

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

Thanks

Btw why don't you use vm based providers instead ?
What about adapting them to support Arm64 ? might be better?
k3d / kind as part of kubevirtci are more experimental than the vm based providers, dedicated to SR-IOV / vGPU.
Vm based providers are also more robust.
Atm we don't maintain the kind-1.23 etc, I feel we should have one path which is e2e covered, whatever it is.

@brianmcarey
Copy link
Member

Btw why don't you use vm based providers instead ?
What about adapting them to support Arm64 ? might be better?

As far as I remember Arm64 doesn't support nested virtualization.

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

As far as I remember Arm64 doesn't support nested virtualization.

https://lwn.net/Articles/921783/ wdyt ?

@zhlhahaha
Copy link
Contributor Author

As far as I remember Arm64 doesn't support nested virtualization.

https://lwn.net/Articles/921783/ wdyt ?

It is a hardware problem. Currently, most of Arm64 CPU on the market (include the Arm64 server for KubeVIrt CICD) does not support nested virtualization.

@zhlhahaha
Copy link
Contributor Author

Hi @oshoval , I have verified the k3d, on Arm64 server and in a nested container environment (bootstrap image)
Good news is k3d can successfully start on Arm64 server and nested container environment, and 81 E2E tests pass and 1 failed.
I did following modification:

  1. use the default CNI rather than calico (any specific reason to use calico CNI?)
  2. make it run as a one node cluster (only have k3d-k3d-server-0 node)

Here are some issues:

  1. vmi-killer seems not works in the cluster
  2. tests/reporter seems not works well
  3. E2E tests failed occationally in mutli-node k3d cluster

I am still checking these issue. And I also need to verify if stability of the k3d provider.

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

Hi @oshoval , I have verified the k3d, on Arm64 server and in a nested container environment (bootstrap image) Good news is k3d can successfully start on Arm64 server and nested container environment, and 81 E2E tests pass and 1 failed. I did following modification:

  1. use the default CNI rather than calico (any specific reason to use calico CNI?)

Thank you

We must install manually the CNI, let it either be Calico or Flannel, because without it, multus doesn't work.
See please #972 (comment)
and the comments below on cons of it (it also has mentions an unmerged commit that use it).

  1. make it run as a one node cluster (only have k3d-k3d-server-0 node)

We need at least 2 nodes (atm we use 3) because we test migration.

Due to 1 and 2
Lets have a different folder for k3d-1.25 ?
It seems SR-IOV need different config.
I prefer to keep the providers untangled, as all the others.

Note that once we move to cluster create using a manifest, it might be easier to maintain
two different configuration.
Atm there is a bug there with podman support so need some workaround (network field is broken).

Here are some issues:

  1. vmi-killer seems not works in the cluster
  2. tests/reporter seems not works well
  3. E2E tests failed occationally in mutli-node k3d cluster

I am still checking these issue. And I also need to verify if stability of the k3d provider.

Thank you

@zhlhahaha
Copy link
Contributor Author

We must install manually the CNI, let it either be Calico or Flannel, because without it, multus doesn't work.

I see. I find out why calico not works on Arm64. It seems that images from quay.io/calico/ are only for x86_64. We need to pull multi-arch container images for calico.

Lets have a different folder for k3d-1.25 ?

Ok, as currently k3d-1.25 are only used by Arm64 CICD, I want make it as simple as possible.

Atm there is a bug there with podman support so need some workaround (network field is broken).

Do you have more information on this?

Thanks, @oshoval

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

Atm there is a bug there with podman support so need some workaround (network field is broken).

Do you have more information on this?

https://rancher-users.slack.com/archives/CHM1EB3A7/p1679999162750929?thread_ts=1678090269.551049&cid=CHM1EB3A7

You can look here (WIP)
oshoval@f0aa327
This is a hack that fix it locally (not on CI, because on CI we cant create this podman network)
oshoval@4b281de
I think the solution will be to remove the network field (which has the bug)
and configure CI to have default network named bridge when using podman.
But it will take time.
I might open an issue about it for k3d.

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

Note that since we don't support podman yet on CI, it might actually be disregarded atm (just support it locally),
so we can use manifests, in order to have robust provider easier, assuming you are using docker.

@zhlhahaha
Copy link
Contributor Author

Note that since we don't support podman yet on CI, it might actually be disregarded atm (just support it locally), so we can use manifests, in order to have robust provider easier, assuming you are using docker.

podman is now used in bootstrap image, so this is a problem.

@oshoval
Copy link
Contributor

oshoval commented Mar 30, 2023

Do you think you can adjust it to use default network named "bridge" and then it will work for you?
for us podman doesn't work at all because we have multi nodes, and on CI with netavark it doesn't work atm

Anyhow, no rush about it, we can discuss when time comes.

@zhlhahaha
Copy link
Contributor Author

Hi @oshoval, I almost finish the verification for k3d on Arm64.

I still have on uncertainty. If the e2e tests passed in k3d cluster, does it means they are works well on k8s cluster? Compared with E2E tests on x86_64 server, which have all rounded tests to verify if it is work well on a standard k8s cluster, on Arm64, we only have the E2E tests in nested containerized environment. If we migrate Arm64 E2E test pipeline from kind provider to k3d provider. Are there potential risks or uncertainty?

cc: @rmohr @dhiller @qinqon @xpivarc @brianmcarey

@oshoval
Copy link
Contributor

oshoval commented Apr 3, 2023

Hi @zhlhahaha
Thanks for the effort, please see the comments on #994
I think we should wait for proper manifests usage, meanwhile you can duplicate the files you need with different name please imo.

Well k3s is k8s compatible, It has bit different architecture than k8s but it is considered compatible,
It is maintained by CNCF as sandbox stage.
For sig-network we are fine with it, I don't know to tell about what needed for ARM tbh.

@zhlhahaha
Copy link
Contributor Author

For sig-network we are fine with it, I don't know to tell about what needed for ARM tbh.

@rmohr @dhiller @qinqon @xpivarc @brianmcarey
Do you have any suggestion?

@xpivarc
Copy link
Member

xpivarc commented Apr 3, 2023

What is the reason to migrate to K3D? I did not check it out, is it certified?

@oshoval
Copy link
Contributor

oshoval commented Apr 3, 2023

The reason:
kubernetes-sigs/kind#2999
Unless you don't need cpu manager for ARM e2e testings.
Current kind provider that is used is out of date (1.23).

It is CNCF certified, Sanbox phase, k8s compatible distribution
CNCF maintain both k8s and k3s
https://k3s.io/

K3d is a wrapper of k3s, allowing to run k3s in containers.

@xpivarc
Copy link
Member

xpivarc commented Apr 13, 2023

Just a note, the kubernetes-sigs/kind#2999 is now resolved.

@oshoval
Copy link
Contributor

oshoval commented Apr 13, 2023

Just a note, the kubernetes-sigs/kind#2999 is now resolved.

Thanks
Please try SR-IOV e2e (if you want and have a provider), since the /dev/null is not for sure the same problem as open /dev/ptmx: operation not permitted: unknown (but possibly, maybe you tried a POC already, anyhow we need e2e),
and I am not sure that kind is better than k3d for us (sig-network specifically), it seems k3d is developing faster atm,
and pretty stable and lighter, in case we need we can go back to kind, it is good to have alternative.
Note that atm it doesn't yet official and also need work, personally I have other priorities.

Btw I think we should consider a repo kubevirt - easy to start where we can have both kind and k3d with the basic recommended configuration that allows to run simple kubevirt machine.
But this is other story.

Of course that if Howard prefer to keep kind he / us can update the non SR-IOV kind provider, note that we don't have e2e for ARM so we can't maintain it (we as sig-network maintain only the k3d-sriov).

@zhlhahaha
Copy link
Contributor Author

Btw I think we should consider a repo kubevirt - easy to start where we can have both kind and k3d with the basic recommended configuration that allows to run simple kubevirt machine. But this is other story.

It is a good idea.

Of course that if Howard prefer to keep kind he / us can update the non SR-IOV kind provider, note that we don't have e2e for ARM so we can't maintain it (we as sig-network maintain only the k3d-sriov).

As Kind is used in kubernetes CI/CD pipeline. I think it is more reliable to verify KubeVirt in the kind k8s environment on Arm platform, so I prefer to keep the kind provider and use it in E2E tests for Arm. And I can maintain the provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants