Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flake in upgrade tests #12260

Merged
merged 1 commit into from
Nov 11, 2021
Merged

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Nov 10, 2021

What this PR does / why we need it:

The cni plugins need access to /run/cni and the dnsname plugin needs
access to /run/containers.

The race condition was basically that a podman stop could either do the
cleanup itself or the spawned cleanup process would do the cleanup if it
was fast enough. The podman stop is executed on the host while the
podman cleanup process is executed in the "parent container". The parent
container contains older plugins than on the host. The dnsname plugin
before version 1.3 could error and this would prevent CNI from
doing a proper cleanup. The plugin errors because it could not find its
files in /run/containers. On my system the test always failed because
the cleanup process was always faster than the stop process. However in
the CI VMs the stop process was usually faster and so it failed only
sometimes.

How to verify it

Which issue(s) this PR fixes:

Fixes #11558

Special notes for your reviewer:

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2021
@rhatdan
Copy link
Member

rhatdan commented Nov 10, 2021

LGTM

@mheon
Copy link
Member

mheon commented Nov 10, 2021

/lgtm
/approve
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

LGTM but tests aren't happy:

Error: statfs /run/cni: no such file or directory

(And I see that someone just pressed Re-run on them all, but I don't think this is a flake)

@edsantiago
Copy link
Member

Root cause: on a fresh VM such as the ones used in CI, /run/cni does not exist. Best place to add it is probably somewhere around here:

# Not entirely a NOP! This is just so we get /run/crun created on a CI VM
$PODMAN run --rm $OLD_PODMAN true

One could do a simple mkdir /run/cni but I would prefer to have it created the proper way. Unfortunately, I can't seem to figure out a magic podman command that will do so. (I tried various forms of network create and run -p 8080:8080). I'm sure that's a no-brainer for you so I leave it in your hands.

The cni plugins need access to /run/cni and the dnsname plugin needs
access to /run/containers.

The race condition was basically that a `podman stop` could either do the
cleanup itself or the spawned cleanup process would do the cleanup if it
was fast enough. The `podman stop` is executed on the host while the
podman cleanup process is executed in the "parent container". The parent
container contains older plugins than on the host. The dnsname plugin
before version 1.3 could error and this would prevent CNI from
doing a proper cleanup. The plugin errors because it could not find its
files in /run/containers. On my system the test always failed because
the cleanup process was always faster than the stop process. However in
the CI VMs the stop process was usually faster and so it failed only
sometimes.

Fixes containers#11558

Signed-off-by: Paul Holzinger <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2021
@Luap99
Copy link
Member Author

Luap99 commented Nov 11, 2021

@edsantiago Should be fixed now: --mac-address ... is needed to create /run/cni and a custom network is needed to create /run/containers

@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2021
@edsantiago edsantiago removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 11, 2021
@openshift-merge-robot openshift-merge-robot merged commit 8fd31c6 into containers:main Nov 11, 2021
@Luap99 Luap99 deleted the upgrade-flake branch November 11, 2021 14:39
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

network upgrade tests: error adding pod: failed to allocate: duplicate allocation is not allowed
5 participants