-
Notifications
You must be signed in to change notification settings - Fork 195
Allow to reproduce CI jobs locally on VMs - part 2 #3751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear why the first few patches still show up in this PR, even though they're merged already. Does this need a rebase?
Vagrantfile
Outdated
@@ -76,6 +76,7 @@ EOF | |||
# This switches back to cgroup v1. It requires a reboot. | |||
sudo dnf install -y grubby | |||
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0" | |||
sudo dnf install -y make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not combine this with the install of grubby above?
@echo " vm-help - Help for the VM runner targets." | ||
@echo "Environment variables:" | ||
@echo " KATA_TESTS_JOB_RUNNER: - local or vm (default)" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new interface does look a lot nicer. A couple of overall concerns:
- Can the docs be updated to describe this new preferred way to run tests
- Adding a few more hundred lines of fairly complex bash to the thousands of existing lines of fragile bash code makes me kinda nervous. But maybe it's a necessary interim step.
|
||
KATA_TESTS_VM_ENGINE ?= vagrant | ||
supported_vm_engines:=$(shell ./job/vm/vm_runner.sh -e) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have two layers of abstraction here: first local vs. vm, then vagrant vs. other possible VM engines. Is it worth the complexity? Could you just have "local" and "vagrant" as the top level options.
Vagrantfile
Outdated
@@ -93,7 +93,6 @@ EOF | |||
export osbuilder_distro="$distro" | |||
|
|||
cd ${kata_tests_repo_dir} | |||
sudo -E PATH=$PATH -H -u #{guest_user} bash -c '.ci/setup.sh' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it definitely needs a docs update, because if you create a Vagrant vm directly (which you might want to do for some more specific test not covered by the existing jobs) it will no longer be set up to do that as is.
# Export it. | ||
typeset -x CI_JOB | ||
bash "$setup_script" | ||
bash "$runner_script" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforcing running the setup and runner immediately after each other worries me a bit. Yes, that's currently the safe way to run the CI jobs, but
- The setup script takes rather a long time
- It means the setup and job output will be combined together, perpetuating Kata's existing problem of having huge logs where it's hard to figure out where things went wrong.
"cloud-hypervisor-k8s-crio" | ||
"cloud-hypervisor-k8s-containerd" | ||
"cloud-hypervisor-k8s-containerd-minimal" | ||
"Ccloud-hypervisor-k8s-containerd-fullL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/Ccloud-hypervisor-k8s-containerd-fullL/cloud-hypervisor-k8s-containerd-full
"firecracker" | ||
"vfio" | ||
"virtiofs_experimental" | ||
"metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm I think that these jobs will be hard to reproduce as they run in baremetal and even if somebody runs the tests locally...the results will not match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the host allows for nested virt, I think it should be workable.
|
||
# Some jobs cannot run inside a VM. For instance, those which need bare-metal | ||
# machine. So here it keeps a list of unsupported jobs. | ||
UNSUPPORTED_JOBS=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about metrics? also I believe vfio needs baremetal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, surprisingly vfio does not run on baremetal right now: it explicitly looks for a second virtio-net device in its "host" to then pass through into the Kata guests.
The command `make` is needed for when the job is trigged inside the VM, so this change ensures it is installed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
This implements an extensible way of executing a CI job on the developer workstation. With this initial implementation the job can be executed on localhost or inside a VM. Currently the only VM engine supported is Vagrant. It is added makefile targets and this should be primary interface with users. Fixes kata-containers#3642 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
45553e1
to
773793e
Compare
Updated this pull request just to fix conflicts with latest version of Vagrantfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wainersm - a few comments.
# Script variables | ||
# | ||
|
||
ALL_AVAILABLE_JOBS=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought: I wonder if these should be specified a a configuration file instead?
typeset -u CI_JOB="$job_id" | ||
# Export it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird indentation (here and elsewhere - tabs vs. spaces?)
# i.e., if the function _do_run_$1 is defined then it returns 0, otherwise return 1. | ||
_runner_exists() { | ||
local runner="$1" | ||
LC_ALL=C type "_do_run_${runner}" &>/dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooi, why is LC_TYPE
needed here? If it's essential, why not set it at the global level?
l) list_jobs_id;; | ||
j) run_job "$OPTARG" "${KATA_TESTS_JOB_RUNNER}";; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not sorted alphebetically.
# TODO: need to find a way to check the libvirt provider was | ||
# installed and is working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you raise a GH issue and put the URL here with a "FIXME"?
KATA_TESTS_VM_ENGINE: choose the VM engine. Defaults to 'vagrant' | ||
KATA_TESTS_VM_NAME: for actions which require a VM "fedora" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the variables here, not hard-coded strings.
local vm="$2" | ||
local cmd | ||
echo "Run job $job_id on $vm VM" | ||
# Assume $GOPATH is exported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't ;) Can you add a check for that here or in a setup function "just in case".
# is_engine_available() | ||
# is_vm_running(name) | ||
# vm_start(name, force_destroy=true) | ||
# vm_destroy(name) | ||
# vm_run_cmd(name) | ||
# vm_shell() | ||
# list_vms() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider making the script check that these functions exist and if not, listing the functions that aren't defined to stderr so it's clear what's wrong. You can do the checking programmatically using something like this:
check_function()
{
local name="${1:-}"
type -t "$name" &>/dev/null || die "function '$name' does not exist"
}
check_and_run()
{
name="${1:-}"
shift
check_function "$name" && eval "$name" $@
}
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Defines the vm_runner interface. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add our boilerplate stuff:
set -o errexit
set -o nounset
set -o pipefail
# XXX: Bash-specific code. zsh doesn't support this option and that *does*
# matter if this script is run sourced... since it'll be using zsh! ;)
[ -n "$BASH_VERSION" ] && set -o errtrace
[ -n "${DEBUG:-}" ] && set -o xtrace
#KATA_TESTS_VM_NAME= The VM name | ||
#KATA_TESTS_VM_ENV_FILE= File that contain variables to be exported in the VM environment | ||
#KATA_TESTS_VM_ENGINE= The VM engine. Default to Vagrant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these lines?
Any update on this @wainersm? |
@wainersm - is this something you still plan to work on? |
@dgibson @GabyCT @jodh-intel I really appreciated your review on this PR however I am thinking in closing it. I explain: David's #3751 (comment) made me thoughtful whether this is the right time for adding another layer of complexity to the CI scripts. I really think the propose changes on this PR adds a more friendly interface to allow users running CI locally, but at this point I don't know if the Vagrant stuff has been useful to others than me. Therefore, first I will promote the use of Vagrant, fix whatever problems come up, and if I feel this extra interface is worth having in then I will re-open this PR. |
ok thanks @wainersm |
This is a continuation of #3547 ((Allow to reproduce CI jobs locally on VMs).
The goal here is make even easier to execute the CI job whether on localhost or inside VM, by introducing makefile targets and some scripts (called runners) that abstract the environment where the job will be executed.
Although I didn't think in every and each details, the initial design is meant to allow us to add new runners (e.g. containers) easily. And for the vm_runner, back-ends other than Vagrant (e.g. pure libvirt scripts) can also be implemented.
There are other improvements already on my mind, for instance, allow the user to login the VM to debug a problem. But I will leave those for a part 3.
A typical usage session would be: