-
Notifications
You must be signed in to change notification settings - Fork 196
Vagrant: boot q35 machine and other improvements #3965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, to the extent that I understand them, with the exception of a couple of nits I've commented on. Currently trying the changes out as well, I'll follow up when I have some results from that.
cdf0d10
to
3b286bf
Compare
Vagrantfile
Outdated
@@ -44,6 +44,23 @@ Vagrant.configure("2") do |config| | |||
if host_arch == "x86_64" | |||
lv.machine_type = "q35" | |||
end | |||
# The VM needs one additional virtio-net device and iommu enabled | |||
# for the vfio tests. | |||
if job == "VFIO" and host_arch == "x86_64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there actually a compelling reason to do this only if the VFIO job is specified. That means that it's not possible to start up the vagrant box with the default options, then run the VFIO job inside manually (possibly after running other tests firsts). Since it takes a pretty long time for vagrant to initialize the box, having to redo it between different types of tests is pretty painful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason is that for Ubuntu it will require a reboot, and reboots sometimes fail. Fedora will always reboot as it needs the cgroups tweak, so not a concern for Fedora.
Indeed it is time consuming and painful the switch between jobs. I will accept your comment and always pass those flags regardless whether VFIO or not. If it turns out a problem when I can implement another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to rollback that idea... in Ubuntu host vagrant fails to detect reboot. If I always set the variables for VFIO tests then the ubuntu guest will always reboot (so it won't work for any CI_JOB).
cat <<-EOF | ||
This script helps you to clean up the VMs and Vagrant's control | ||
directories. | ||
By default it will attempt to destroy all VMs. For additionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of horrible that this isn't built into vagrant :(.
Currently the VM is configured with QEMU's i440FX machine type which has no PCIe support (among other restrictions). This changed the Vagrantfile so that the it boots a q35 machine instead. Fixes kata-containers#3752 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
I just updated this PR to address two comments from @dgibson :
@dgibson while this PR may not be perfect, it seems good enough (okay, I am suspect...). Can I have your approval? |
README.md
Outdated
then please [open an enhancement request](https://github.com/kata-containers/tests/issues/new/choose) | ||
to let us know. | ||
|
||
|Host \ Guest | fedora | ubuntu | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s!\!/!g
- I don't understand this table ;(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jodh-intel , I will rephrase to:
Currently it supports the creation of Fedora 32 and Ubuntu 20.04 VM, as shown on the table
below. The Vagrantfile
was tested on Fedora 33 and Ubuntu 20.04 hosts, and it is
known to fail the boot of Fedora VM on
Ubuntu host. If you have the need of testing on a different guest or it fails to work
on your host's distro then please open an issue
to let us know.
Host / Guest | Fedora 32 | Ubuntu 20.04 |
---|---|---|
Fedora 33 | Yes | Yes |
Ubuntu 20.04 | No | Yes |
Does it looks better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I see what you were trying to do with the backslash now. Assuming I'm now understanding this, I'd make it clearer still, even if it does involve a bit of repetition:
Host | Fedora 32 guest | Ubuntu 20.04 guest |
---|---|---|
Fedora 33 | Yes | Yes |
Ubuntu 20.04 | No | Yes |
Maybe one day markdown will support cell spans natively, like reStructuredText has for years.
.ci/vagrant-cleaner.sh
Outdated
done | ||
|
||
if ! command -v vagrant &>/dev/null; then | ||
echo "ERROR: missing 'vagrant' command. Run $0 -h for help." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages should be redirected to stderr
(here and elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
.ci/vagrant-cleaner.sh
Outdated
|
||
vagrant_destroy "$vm" | ||
if [ $? -ne 0 ]; then | ||
echo "Attempt to clean up the domain on libvirt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the user, it's unclear if this is a warning or informational message. Maybe add a "WARNING" or "INFO" prefix to clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I will print as a warn message.
dfe3f2f
to
e1b3ae5
Compare
@jodh-intel @dgibson can we get this merged? I think it is good enough... I can work on follow up PRs if needed. |
/test |
.ci/vagrant-cleaner.sh
Outdated
cidir="$(dirname $(readlink -f "$0"))" | ||
|
||
# Print message to stderr. Add the 'ERROR:' prefix. | ||
perror() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spot the C hacker! 😄
Should this call exit 1
though? It looks like the majority of calls immediately exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I just updated the PR but I renamed perror
to die
:(
if vagrant destroy --force "$vm"; then | ||
echo "VM '$vm' destroyed: OK" | ||
else | ||
echo "VM '$vm' destroyed: FAILED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/echo/perror/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I just want to info the user.
.ci/vagrant-cleaner.sh
Outdated
done | ||
|
||
[ $ret -eq 0 ] || \ | ||
perror "Failed the removal of some VM." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the script die at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wainersm.
/test |
/test-ubuntu |
/retest-clh-k8s-containerd |
/test-vfio |
Moved to wip to fix the job jenkins-ci-ubuntu-1804-containerd-k8s-e2e-minimal. This job is running on Ubuntu 18.04, which apparently hasn't the driverctl package (see commit a3d903a of this PR). |
The commit a043363 introduced improvements to the vfio functional test which now requires the driverctl tool. The driverctl package is installed in the host via .ci/vfio_jenkins_job_build.sh, but that script is never called by the .ci/setup.sh; as a result, the package isn't installed when I run the Vagrant-based job locally. This commit adds the driverctl (also pciutils) to the list of packages installed in the host Fedora and Ubuntu. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The Vagrantfile was changed so that the VM will have IOMMU enabled and an additional virtio-net device in case of the VFIO job. Additional Kernel command-line arguments are required to enable IOMMU, therefore GRUB is reconfigured and the VM should reboot. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The vagrant configuration sometimes can get into inconsistent state. For instance, the domain on libvirt was created by the framework that thinks the box is not initialized yet. Or eventually the developer wants to simply wipe out all vagrant files and created resources from his/her workstation. In this commit it was introduced the vagrant-cleaner.sh script which aims on help developers to clean up the vagrant stuffs. Fixes kata-containers#3757 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Added on README.md a table with the combinations of tested host versus guest so developers are aware of what is (or not) expected to work. Fixes kata-containers#3942 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
By default Vagrant creates the domain in libvirt prefixed with the project's directory name, in our case it is "tests-". Let's use meaningful name instead. Below is shown the new domain names in libvirt. $ virsh list --all Id Name State --------------------------------------------- 1 kata_containers_test-ubuntu running 4 kata_containers_test-fedora running Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The grubby package is already installed in the pulled VM image. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The installation of packages in the generic/fedora32 box is way faster than the currently used fedora/32-cloud-base. In my laptop it can save approximately 30min of the time to `vagrant up fedora`. So in this commit I am switching the box used. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
a865dd2
to
d8557ac
Compare
Updated the commit fe4bc7f so that driverctl package is only installed on Ubuntu 20.04 (or newer) because that isn't available on 18.04 thus some CI jobs failing. |
/test |
Hi @jodh-intel , updated the commit fe4bc7f , mind to review again? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wainersm.
lgtm
I started this string of commits by changing the Vagrantfile so that the VM boots with q35 machine type (fix #3752). As q35 is required to run the vfio tests (because it needs iommu and one extra virtio-net device) and the Vagrantfile had no support for such as test, I also changed it so that if the user exports
CI_JOB=VFIO
then extra arguments are passed to QEMU to allow testing Kata + VFIO.Other changes came after to address the issues #3757 and #3942.
I tested those changes on my laptop x86_64 (Intel) with Fedora 33. I booted the fedora and ubuntu guests for VFIO and non-VFIO tests (
CI_JOB=CRIO_K8S
) normally (although sometimes the SSH connection fails after reboot; usually I destroy/up an 2nd time and it works). I don't have a ubuntu bare-metal to test the VFIO job but I will give a try with nested VMs.Still about the VFIO arguments being passed to QEMU...now I realized that they are specific for Intel processors, I should be working on the AMD equivalents when/if time permits (if that is your case, i.e., you'd like to run VFIO tests on AMD, then please let me know).
Warn: if you have any vagrant boxes created from our old Vagrantfile then I recommend you to destroy them all before apply this pull request. The reason is because I changed the prefix name which is used to create the domain in libvirt; so it may leave orphan domains on your machine.