Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Allow to reproduce CI jobs locally on VMs - part 2 #3751

Closed
wants to merge 2 commits into from

Conversation

wainersm
Copy link
Contributor

This is a continuation of #3547 ((Allow to reproduce CI jobs locally on VMs).

The goal here is make even easier to execute the CI job whether on localhost or inside VM, by introducing makefile targets and some scripts (called runners) that abstract the environment where the job will be executed.

Although I didn't think in every and each details, the initial design is meant to allow us to add new runners (e.g. containers) easily. And for the vm_runner, back-ends other than Vagrant (e.g. pure libvirt scripts) can also be implemented.

There are other improvements already on my mind, for instance, allow the user to login the VM to debug a problem. But I will leave those for a part 3.

A typical usage session would be:

[wmoschet@wainer-laptop .ci]$ make help
Job related targets

  job-run-NAME             - Run the job NAME. Use KATA_TESTS_JOB_RUNNER to set the runner.
  job-list                 - List all available CI jobs
help: CI targets
  vm-help                  - Help for the VM runner targets.
Environment variables:
  KATA_TESTS_JOB_RUNNER:   - local or vm (default)
[wmoschet@wainer-laptop .ci]$ make vm-help
vm-help: Manage CI VMs with the following targets:
  vm-clean                 - Destroy all VMs
  vm-list                  - Print all supported VM names
Environment variables
  KATA_TESTS_VM_ENGINE: set the VM engine. Defaults to 'vagrant'.
                        Available engines: vagrant
  KATA_TESTS_VM_NAME:   the VM name.
[wmoschet@wainer-laptop .ci]$ make vm-list
The avaliable VMs:
fedora
ubuntu
[wmoschet@wainer-laptop .ci]$ make job-list
baremetal-pmem cri_containerd_k8s cri_containerd_k8s_initrd cri_containerd_k8s_complete cri_containerd_k8s_minimal crio_k8s crio_k8s_complete crio_k8s_minimal cloud-hypervisor-k8s-crio cloud-hypervisor-k8s-containerd cloud-hypervisor-k8s-containerd-minimal Ccloud-hypervisor-k8s-containerd-fullL firecracker vfio virtiofs_experimental metrics metrics_experimental
[wmoschet@wainer-laptop .ci]$ KATA_TESTS_VM_NAME=fedora make job-run-crio_k8s
[wmoschet@wainer-laptop .ci]$ make vm-clean

@wainersm wainersm added rfc Requires input from the team wip Work in Progress (PR incomplete - needs more work or rework) area/ci labels Jul 23, 2021
@wainersm wainersm requested a review from a team as a code owner July 23, 2021 21:47
Copy link
Contributor

@dgibson dgibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why the first few patches still show up in this PR, even though they're merged already. Does this need a rebase?

Vagrantfile Outdated
@@ -76,6 +76,7 @@ EOF
# This switches back to cgroup v1. It requires a reboot.
sudo dnf install -y grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
sudo dnf install -y make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not combine this with the install of grubby above?

@echo " vm-help - Help for the VM runner targets."
@echo "Environment variables:"
@echo " KATA_TESTS_JOB_RUNNER: - local or vm (default)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new interface does look a lot nicer. A couple of overall concerns:

  • Can the docs be updated to describe this new preferred way to run tests
  • Adding a few more hundred lines of fairly complex bash to the thousands of existing lines of fragile bash code makes me kinda nervous. But maybe it's a necessary interim step.


KATA_TESTS_VM_ENGINE ?= vagrant
supported_vm_engines:=$(shell ./job/vm/vm_runner.sh -e)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have two layers of abstraction here: first local vs. vm, then vagrant vs. other possible VM engines. Is it worth the complexity? Could you just have "local" and "vagrant" as the top level options.

Vagrantfile Outdated
@@ -93,7 +93,6 @@ EOF
export osbuilder_distro="$distro"

cd ${kata_tests_repo_dir}
sudo -E PATH=$PATH -H -u #{guest_user} bash -c '.ci/setup.sh'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it definitely needs a docs update, because if you create a Vagrant vm directly (which you might want to do for some more specific test not covered by the existing jobs) it will no longer be set up to do that as is.

# Export it.
typeset -x CI_JOB
bash "$setup_script"
bash "$runner_script"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforcing running the setup and runner immediately after each other worries me a bit. Yes, that's currently the safe way to run the CI jobs, but

  • The setup script takes rather a long time
  • It means the setup and job output will be combined together, perpetuating Kata's existing problem of having huge logs where it's hard to figure out where things went wrong.

"cloud-hypervisor-k8s-crio"
"cloud-hypervisor-k8s-containerd"
"cloud-hypervisor-k8s-containerd-minimal"
"Ccloud-hypervisor-k8s-containerd-fullL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/s/Ccloud-hypervisor-k8s-containerd-fullL/cloud-hypervisor-k8s-containerd-full

"firecracker"
"vfio"
"virtiofs_experimental"
"metrics"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm I think that these jobs will be hard to reproduce as they run in baremetal and even if somebody runs the tests locally...the results will not match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the host allows for nested virt, I think it should be workable.


# Some jobs cannot run inside a VM. For instance, those which need bare-metal
# machine. So here it keeps a list of unsupported jobs.
UNSUPPORTED_JOBS=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about metrics? also I believe vfio needs baremetal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, surprisingly vfio does not run on baremetal right now: it explicitly looks for a second virtio-net device in its "host" to then pass through into the Kata guests.

The command `make` is needed for when the job is trigged inside the
VM, so this change ensures it is installed.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
This implements an extensible way of executing a CI job on the
developer workstation.

With this initial implementation the job can be executed on localhost or
inside a VM. Currently the only VM engine supported is Vagrant.

It is added makefile targets and this should be primary interface with
users.

Fixes kata-containers#3642
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm
Copy link
Contributor Author

Updated this pull request just to fix conflicts with latest version of Vagrantfile.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wainersm - a few comments.

# Script variables
#

ALL_AVAILABLE_JOBS=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought: I wonder if these should be specified a a configuration file instead?

Comment on lines +85 to +86
typeset -u CI_JOB="$job_id"
# Export it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indentation (here and elsewhere - tabs vs. spaces?)

# i.e., if the function _do_run_$1 is defined then it returns 0, otherwise return 1.
_runner_exists() {
local runner="$1"
LC_ALL=C type "_do_run_${runner}" &>/dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooi, why is LC_TYPE needed here? If it's essential, why not set it at the global level?

Comment on lines +124 to +125
l) list_jobs_id;;
j) run_job "$OPTARG" "${KATA_TESTS_JOB_RUNNER}";;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sorted alphebetically.

Comment on lines +15 to +16
# TODO: need to find a way to check the libvirt provider was
# installed and is working.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you raise a GH issue and put the URL here with a "FIXME"?

Comment on lines +76 to +77
KATA_TESTS_VM_ENGINE: choose the VM engine. Defaults to 'vagrant'
KATA_TESTS_VM_NAME: for actions which require a VM "fedora"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the variables here, not hard-coded strings.

local vm="$2"
local cmd
echo "Run job $job_id on $vm VM"
# Assume $GOPATH is exported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't ;) Can you add a check for that here or in a setup function "just in case".

Comment on lines +32 to +38
# is_engine_available()
# is_vm_running(name)
# vm_start(name, force_destroy=true)
# vm_destroy(name)
# vm_run_cmd(name)
# vm_shell()
# list_vms()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider making the script check that these functions exist and if not, listing the functions that aren't defined to stderr so it's clear what's wrong. You can do the checking programmatically using something like this:

check_function()
{
    local name="${1:-}"

    type -t "$name" &>/dev/null || die "function '$name' does not exist"
}

check_and_run()
{
    name="${1:-}"
    shift

    check_function "$name" && eval "$name" $@
}

# SPDX-License-Identifier: Apache-2.0
#
# Defines the vm_runner interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add our boilerplate stuff:

set -o errexit
set -o nounset
set -o pipefail

# XXX: Bash-specific code. zsh doesn't support this option and that *does*
# matter if this script is run sourced... since it'll be using zsh! ;)
[ -n "$BASH_VERSION" ] && set -o errtrace

[ -n "${DEBUG:-}" ] && set -o xtrace

Comment on lines +24 to +26
#KATA_TESTS_VM_NAME= The VM name
#KATA_TESTS_VM_ENV_FILE= File that contain variables to be exported in the VM environment
#KATA_TESTS_VM_ENGINE= The VM engine. Default to Vagrant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these lines?

@jodh-intel
Copy link
Contributor

Any update on this @wainersm?

@jodh-intel
Copy link
Contributor

@wainersm - is this something you still plan to work on?

@wainersm
Copy link
Contributor Author

@dgibson @GabyCT @jodh-intel I really appreciated your review on this PR however I am thinking in closing it.

I explain: David's #3751 (comment) made me thoughtful whether this is the right time for adding another layer of complexity to the CI scripts. I really think the propose changes on this PR adds a more friendly interface to allow users running CI locally, but at this point I don't know if the Vagrant stuff has been useful to others than me. Therefore, first I will promote the use of Vagrant, fix whatever problems come up, and if I feel this extra interface is worth having in then I will re-open this PR.

@GabyCT
Copy link
Contributor

GabyCT commented Nov 23, 2021

ok thanks @wainersm

@wainersm wainersm closed this Nov 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/ci rfc Requires input from the team wip Work in Progress (PR incomplete - needs more work or rework)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants