Allow to reproduce CI jobs locally on VMs - part 2 #3751

wainersm · 2021-07-23T21:47:28Z

This is a continuation of #3547 ((Allow to reproduce CI jobs locally on VMs).

The goal here is make even easier to execute the CI job whether on localhost or inside VM, by introducing makefile targets and some scripts (called runners) that abstract the environment where the job will be executed.

Although I didn't think in every and each details, the initial design is meant to allow us to add new runners (e.g. containers) easily. And for the vm_runner, back-ends other than Vagrant (e.g. pure libvirt scripts) can also be implemented.

There are other improvements already on my mind, for instance, allow the user to login the VM to debug a problem. But I will leave those for a part 3.

A typical usage session would be:

[wmoschet@wainer-laptop .ci]$ make help
Job related targets

  job-run-NAME             - Run the job NAME. Use KATA_TESTS_JOB_RUNNER to set the runner.
  job-list                 - List all available CI jobs
help: CI targets
  vm-help                  - Help for the VM runner targets.
Environment variables:
  KATA_TESTS_JOB_RUNNER:   - local or vm (default)
[wmoschet@wainer-laptop .ci]$ make vm-help
vm-help: Manage CI VMs with the following targets:
  vm-clean                 - Destroy all VMs
  vm-list                  - Print all supported VM names
Environment variables
  KATA_TESTS_VM_ENGINE: set the VM engine. Defaults to 'vagrant'.
                        Available engines: vagrant
  KATA_TESTS_VM_NAME:   the VM name.
[wmoschet@wainer-laptop .ci]$ make vm-list
The avaliable VMs:
fedora
ubuntu
[wmoschet@wainer-laptop .ci]$ make job-list
baremetal-pmem cri_containerd_k8s cri_containerd_k8s_initrd cri_containerd_k8s_complete cri_containerd_k8s_minimal crio_k8s crio_k8s_complete crio_k8s_minimal cloud-hypervisor-k8s-crio cloud-hypervisor-k8s-containerd cloud-hypervisor-k8s-containerd-minimal Ccloud-hypervisor-k8s-containerd-fullL firecracker vfio virtiofs_experimental metrics metrics_experimental
[wmoschet@wainer-laptop .ci]$ KATA_TESTS_VM_NAME=fedora make job-run-crio_k8s
[wmoschet@wainer-laptop .ci]$ make vm-clean

dgibson

I'm unclear why the first few patches still show up in this PR, even though they're merged already. Does this need a rebase?

dgibson · 2021-07-26T05:02:53Z

Vagrantfile

@@ -76,6 +76,7 @@ EOF
        # This switches back to cgroup v1. It requires a reboot.
        sudo dnf install -y grubby
        sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
+        sudo dnf install -y make


Why not combine this with the install of grubby above?

dgibson · 2021-07-26T05:11:35Z

.ci/Makefile

+	@echo "  vm-help                  - Help for the VM runner targets."
+	@echo "Environment variables:"
+	@echo "  KATA_TESTS_JOB_RUNNER:   - local or vm (default)"
+


The new interface does look a lot nicer. A couple of overall concerns:

Can the docs be updated to describe this new preferred way to run tests

Adding a few more hundred lines of fairly complex bash to the thousands of existing lines of fragile bash code makes me kinda nervous. But maybe it's a necessary interim step.

dgibson · 2021-07-26T05:12:26Z

.ci/job/vm/Makefile.include

+
+KATA_TESTS_VM_ENGINE ?= vagrant
+supported_vm_engines:=$(shell ./job/vm/vm_runner.sh -e)
+


You have two layers of abstraction here: first local vs. vm, then vagrant vs. other possible VM engines. Is it worth the complexity? Could you just have "local" and "vagrant" as the top level options.

dgibson · 2021-07-26T05:16:06Z

Vagrantfile

@@ -93,7 +93,6 @@ EOF
    export osbuilder_distro="$distro"

    cd ${kata_tests_repo_dir}
-    sudo -E PATH=$PATH -H -u #{guest_user} bash -c '.ci/setup.sh'


This seems like it definitely needs a docs update, because if you create a Vagrant vm directly (which you might want to do for some more specific test not covered by the existing jobs) it will no longer be set up to do that as is.

dgibson · 2021-07-26T05:17:56Z

.ci/job/job_runner.sh

+	# Export it.
+	typeset -x CI_JOB
+	bash "$setup_script"
+	bash "$runner_script"


Enforcing running the setup and runner immediately after each other worries me a bit. Yes, that's currently the safe way to run the CI jobs, but

The setup script takes rather a long time

It means the setup and job output will be combined together, perpetuating Kata's existing problem of having huge logs where it's hard to figure out where things went wrong.

GabyCT · 2021-07-28T19:18:57Z

.ci/job/job_runner.sh

+	"cloud-hypervisor-k8s-crio"
+	"cloud-hypervisor-k8s-containerd"
+	"cloud-hypervisor-k8s-containerd-minimal"
+	"Ccloud-hypervisor-k8s-containerd-fullL"


/s/Ccloud-hypervisor-k8s-containerd-fullL/cloud-hypervisor-k8s-containerd-full

GabyCT · 2021-07-28T19:19:45Z

.ci/job/job_runner.sh

+	"firecracker"
+	"vfio"
+	"virtiofs_experimental"
+	"metrics"


mmm I think that these jobs will be hard to reproduce as they run in baremetal and even if somebody runs the tests locally...the results will not match

As long as the host allows for nested virt, I think it should be workable.

GabyCT · 2021-07-28T19:21:53Z

.ci/job/vm/vm_runner.sh

+
+# Some jobs cannot run inside a VM. For instance, those which need bare-metal
+# machine. So here it keeps a list of unsupported jobs.
+UNSUPPORTED_JOBS=(


what about metrics? also I believe vfio needs baremetal

No, surprisingly vfio does not run on baremetal right now: it explicitly looks for a second virtio-net device in its "host" to then pass through into the Kata guests.

The command `make` is needed for when the job is trigged inside the VM, so this change ensures it is installed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

This implements an extensible way of executing a CI job on the developer workstation. With this initial implementation the job can be executed on localhost or inside a VM. Currently the only VM engine supported is Vagrant. It is added makefile targets and this should be primary interface with users. Fixes kata-containers#3642 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm · 2021-09-20T17:47:37Z

Updated this pull request just to fix conflicts with latest version of Vagrantfile.

jodh-intel

Thanks @wainersm - a few comments.

jodh-intel · 2021-10-15T13:37:24Z

.ci/job/job_runner.sh

+# Script variables
+#
+
+ALL_AVAILABLE_JOBS=(


Random thought: I wonder if these should be specified a a configuration file instead?

jodh-intel · 2021-10-15T13:37:47Z

.ci/job/job_runner.sh

+	typeset -u CI_JOB="$job_id"
+        # Export it.


Weird indentation (here and elsewhere - tabs vs. spaces?)

jodh-intel · 2021-10-15T13:39:54Z

.ci/job/job_runner.sh

+# i.e., if the function _do_run_$1 is defined then it returns 0, otherwise return 1.
+_runner_exists() {
+	local runner="$1"
+        LC_ALL=C type "_do_run_${runner}" &>/dev/null


ooi, why is LC_TYPE needed here? If it's essential, why not set it at the global level?

jodh-intel · 2021-10-15T13:40:16Z

.ci/job/job_runner.sh

+			l) list_jobs_id;;
+			j) run_job "$OPTARG" "${KATA_TESTS_JOB_RUNNER}";;


Nit: Not sorted alphebetically.

jodh-intel · 2021-10-15T13:52:53Z

.ci/job/vm/impl_vagrant.sh

+    # TODO: need to find a way to check the libvirt provider was
+    # installed and is working.


Can you raise a GH issue and put the URL here with a "FIXME"?

jodh-intel · 2021-10-15T13:54:55Z

.ci/job/vm/vm_runner.sh

+	  KATA_TESTS_VM_ENGINE: choose the VM engine. Defaults to 'vagrant'
+	  KATA_TESTS_VM_NAME: for actions which require a VM "fedora"


Please use the variables here, not hard-coded strings.

jodh-intel · 2021-10-15T13:55:39Z

.ci/job/vm/vm_runner.sh

+	local vm="$2"
+	local cmd
+	echo "Run job $job_id on $vm VM"
+	# Assume $GOPATH is exported


I wouldn't ;) Can you add a check for that here or in a setup function "just in case".

jodh-intel · 2021-10-15T13:59:50Z

.ci/job/vm/vm_runner.sh

+#  is_engine_available()
+#  is_vm_running(name)
+#  vm_start(name, force_destroy=true)
+#  vm_destroy(name)
+#  vm_run_cmd(name)
+#  vm_shell()
+#  list_vms()


I'd consider making the script check that these functions exist and if not, listing the functions that aren't defined to stderr so it's clear what's wrong. You can do the checking programmatically using something like this:

check_function() { local name="${1:-}" type -t "$name" &>/dev/null || die "function '$name' does not exist" } check_and_run() { name="${1:-}" shift check_function "$name" && eval "$name" $@ }

jodh-intel · 2021-10-15T14:02:04Z

.ci/job/vm/vm_runner.sh

+# SPDX-License-Identifier: Apache-2.0
+#
+# Defines the vm_runner interface.
+


You might want to add our boilerplate stuff:

set -o errexit set -o nounset set -o pipefail # XXX: Bash-specific code. zsh doesn't support this option and that *does* # matter if this script is run sourced... since it'll be using zsh! ;) [ -n "$BASH_VERSION" ] && set -o errtrace [ -n "${DEBUG:-}" ] && set -o xtrace

jodh-intel · 2021-10-15T14:02:23Z

.ci/job/vm/vm_runner.sh

+#KATA_TESTS_VM_NAME= The VM name
+#KATA_TESTS_VM_ENV_FILE= File that contain variables to be exported in the VM environment
+#KATA_TESTS_VM_ENGINE= The VM engine. Default to Vagrant


Do we need these lines?

jodh-intel · 2021-10-29T13:41:55Z

Any update on this @wainersm?

jodh-intel · 2021-11-05T14:38:19Z

@wainersm - is this something you still plan to work on?

wainersm · 2021-11-18T14:38:17Z

@dgibson @GabyCT @jodh-intel I really appreciated your review on this PR however I am thinking in closing it.

I explain: David's #3751 (comment) made me thoughtful whether this is the right time for adding another layer of complexity to the CI scripts. I really think the propose changes on this PR adds a more friendly interface to allow users running CI locally, but at this point I don't know if the Vagrant stuff has been useful to others than me. Therefore, first I will promote the use of Vagrant, fix whatever problems come up, and if I feel this extra interface is worth having in then I will re-open this PR.

GabyCT · 2021-11-23T16:42:59Z

ok thanks @wainersm

wainersm added rfc Requires input from the team wip Work in Progress (PR incomplete - needs more work or rework) area/ci labels Jul 23, 2021

wainersm requested review from fidencio, c3d, snir911 and jodh-intel July 23, 2021 21:47

wainersm requested a review from a team as a code owner July 23, 2021 21:47

dgibson reviewed Jul 26, 2021

View reviewed changes

GabyCT reviewed Jul 28, 2021

View reviewed changes

wainersm mentioned this pull request Sep 17, 2021

RFE | Re-run the CI before officially merging a PR #3974

Open

wainersm added 2 commits September 20, 2021 14:35

vagrantfile: Ensure make is installed in the vm

391907e

The command `make` is needed for when the job is trigged inside the VM, so this change ensures it is installed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm force-pushed the vagrant_ci-part2 branch from 45553e1 to 773793e Compare September 20, 2021 17:46

jodh-intel reviewed Oct 15, 2021

View reviewed changes

wainersm closed this Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to reproduce CI jobs locally on VMs - part 2 #3751

Allow to reproduce CI jobs locally on VMs - part 2 #3751

wainersm commented Jul 23, 2021

dgibson left a comment

dgibson Jul 26, 2021

dgibson Jul 26, 2021

dgibson Jul 26, 2021

dgibson Jul 26, 2021

dgibson Jul 26, 2021

GabyCT Jul 28, 2021

GabyCT Jul 28, 2021

dgibson Jul 29, 2021

GabyCT Jul 28, 2021

dgibson Jul 29, 2021

wainersm commented Sep 20, 2021

jodh-intel left a comment

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel Oct 15, 2021

jodh-intel commented Oct 29, 2021

jodh-intel commented Nov 5, 2021

wainersm commented Nov 18, 2021

GabyCT commented Nov 23, 2021


		KATA_TESTS_VM_ENGINE ?= vagrant
		supported_vm_engines:=$(shell ./job/vm/vm_runner.sh -e)

		l) list_jobs_id;;
		j) run_job "$OPTARG" "${KATA_TESTS_JOB_RUNNER}";;

		# TODO: need to find a way to check the libvirt provider was
		# installed and is working.

		KATA_TESTS_VM_ENGINE: choose the VM engine. Defaults to 'vagrant'
		KATA_TESTS_VM_NAME: for actions which require a VM "fedora"

Allow to reproduce CI jobs locally on VMs - part 2 #3751

Allow to reproduce CI jobs locally on VMs - part 2 #3751

Conversation

wainersm commented Jul 23, 2021

dgibson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wainersm commented Sep 20, 2021

jodh-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jodh-intel commented Oct 29, 2021

jodh-intel commented Nov 5, 2021

wainersm commented Nov 18, 2021

GabyCT commented Nov 23, 2021