Skip to content

Commit

Permalink
Merge pull request #8749 from edsantiago/upgrade_test
Browse files Browse the repository at this point in the history
podman upgrade tests
  • Loading branch information
openshift-merge-robot authored Feb 26, 2021
2 parents 05410e8 + 79eaadd commit 397aae3
Show file tree
Hide file tree
Showing 7 changed files with 450 additions and 1 deletion.
33 changes: 33 additions & 0 deletions .cirrus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -598,6 +598,38 @@ rootless_system_test_task:
main_script: *main
always: *logs_artifacts

# FIXME: we may want to consider running this from nightly cron instead of CI.
# The tests are actually pretty quick (less than a minute) but they do rely
# on pulling images from quay.io, which means we're subject to network flakes.
#
# FIXME: how does this env matrix work, anyway? Does it spin up multiple VMs?
# We might just want to encode the version matrix in runner.sh instead
upgrade_test_task:
name: "Upgrade test: from $PODMAN_UPGRADE_FROM"
alias: upgrade_test
skip: *tags
only_if: *not_docs
depends_on:
- local_system_test
matrix:
- env:
PODMAN_UPGRADE_FROM: v1.9.0
- env:
PODMAN_UPGRADE_FROM: v2.0.6
- env:
PODMAN_UPGRADE_FROM: v2.1.1
gce_instance: *standardvm
env:
TEST_FLAVOR: upgrade_test
DISTRO_NV: ${FEDORA_NAME}
VM_IMAGE_NAME: ${FEDORA_CACHE_IMAGE_NAME}
# ID for re-use of build output
_BUILD_CACHE_HANDLE: ${FEDORA_NAME}-build-${CIRRUS_BUILD_ID}
clone_script: *noop
gopath_cache: *ro_gopath_cache
setup_script: *setup
main_script: *main
always: *logs_artifacts

# This task is critical. It updates the "last-used by" timestamp stored
# in metadata for all VM images. This mechanism functions in tandem with
Expand Down Expand Up @@ -654,6 +686,7 @@ success_task:
- local_system_test
- remote_system_test
- rootless_system_test
- upgrade_test
- meta
container: *smallcontainer
env:
Expand Down
4 changes: 4 additions & 0 deletions contrib/cirrus/runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ function _run_sys() {
dotest system
}

function _run_upgrade_test() {
bats test/upgrade |& logformatter
}

function _run_bindings() {
# shellcheck disable=SC2155
export PATH=$PATH:$GOSRC/hack
Expand Down
1 change: 1 addition & 0 deletions contrib/cirrus/setup_environment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ case "$TEST_FLAVOR" in
compose) ;&
int) ;&
sys) ;&
upgrade_test) ;&
bindings) ;&
endpoint)
# Use existing host bits when testing is to happen inside a container
Expand Down
2 changes: 1 addition & 1 deletion test/system/helpers.bash
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ function run_podman() {
echo "$_LOG_PROMPT $PODMAN $*"
# BATS hangs if a subprocess remains and keeps FD 3 open; this happens
# if podman crashes unexpectedly without cleaning up subprocesses.
run timeout --foreground -v --kill=10 $PODMAN_TIMEOUT $PODMAN "$@" 3>/dev/null
run timeout --foreground -v --kill=10 $PODMAN_TIMEOUT $PODMAN $_PODMAN_TEST_OPTS "$@" 3>/dev/null
# without "quotes", multiple lines are glommed together into one
if [ -n "$output" ]; then
echo "$output"
Expand Down
87 changes: 87 additions & 0 deletions test/upgrade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
Background
==========

For years we've been needing a way to test podman upgrades; this
became much more critical on December 7, 2020, when Matt disclosed
a bug he had found over the weekend
([#8613](https://github.com/containers/podman/issues/8613))
in which reuse of a previously-defined field name would
result in fatal JSON decode failures if current-podman were
to try reading containers created with podman <= 1.8 (FIXME: confirm)

Upgrade testing is a daunting problem; but in the December 12
Cabal meeting Dan suggested using podman-in-podman. This PR
is the result of fleshing out that idea.

Overview
========

The BATS script in this directory fetches and runs an old-podman
container image from quay.io/podman, uses it to create and run
a number of containers, then uses new-podman to interact with
those containers.

As of 2021-02-23 the available old-podman versions are:

```console
$ ./bin/podman search --list-tags quay.io/podman/stable | awk '$2 ~ /^v/ { print $2}' | sort | column -c 75
v1.4.2 v1.5.0 v1.6 v1.9.0 v2.0.2 v2.1.1
v1.4.4 v1.5.1 v1.6.2 v1.9.1 v2.0.6 v2.2.1
```

Test invocation is:
```console
$ sudo env PODMAN=bin/podman PODMAN_UPGRADE_FROM=v1.9.0 PODMAN_UPGRADE_TEST_DEBUG= bats test/upgrade
```
(Path assumes you're cd'ed to top-level podman repo). `PODMAN_UPGRADE_FROM`
can be any of the versions above. `PODMAN_UPGRADE_TEST_DEBUG` is empty
here, but listed so you can set it `=1` and leave the podman_parent
container running. Interacting with this container is left as an
exercise for the reader.

The script will pull the given podman image, invoke it with a scratch
root directory, and have it do a small set of podman stuff (pull an
image, create/run some containers). This podman process stays running
because if it exits, it kills containers running inside the container.

We then invoke the current (host-installed) podman, using the same
scratch root directory, and perform operations on those images and
containers. Most of those operations are done in individual @tests.

The goal is to have this upgrade test run in CI, iterating over a
loop of known old versions. This list would need to be hand-maintained
and updated on new releases. There might also need to be extra
configuration defined, such as per-version commands (see below).

Findings
========

Well, first, `v1.6.2` won't work on default f32/f33: the image
does not include `crun`, so it can't work at all:

ERRO[0000] oci runtime "runc" does not support CGroups V2: use system migrate to mitigate

I realize that it's kind of stupid not to test 1.6, since that's
precisely the test that would've caught #8613 early, but I just
don't think it's worth the hassle of setting up cgroupsv1 VMs.

For posterity, in an earlier incantation of this script I tried
booting f32 into cgroupsv1 and ran into the following warnings
when running new-podman on old-containers:
```
ERRO[0000] error joining network namespace for container 322b66d94640e31b2e6921565445cf0dade4ec13cabc16ee5f29292bdc038341: error retrieving network namespace at /var/run/netns/cni-577e2289-2c05-2e28-3c3d-002a5596e7da: failed to Statfs "/var/run/netns/cni-577e2289
```

Where To Go From Here
=====================

* Tests are still (2021-02-23) incomplete, with several failing outright.
See FIXMEs in the code.

* Figuring out how/if to run rootless. I think this is possible, perhaps
even necessary, but will be tricky to get right because of home-directory
mounting.

* Figuring out how/if to run variations with different config files
(e.g. running OLD-PODMAN that creates a user libpod.conf, tweaking
that in the test, then running NEW-PODMAN upgrate tests)
11 changes: 11 additions & 0 deletions test/upgrade/helpers.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# -*- bash -*-

load "../system/helpers"

setup() {
:
}

teardown() {
:
}
Loading

0 comments on commit 397aae3

Please sign in to comment.