Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman kube generate - add actual tests #15365

Merged

Conversation

edsantiago
Copy link
Member

@edsantiago edsantiago commented Aug 17, 2022

This exposed a nasty bug in our system-test setup: Ubuntu (runc)
was writing a scratch containers.conf file, and setting CONTAINERS_CONF
to point to it. This was well-intentionedly introduced in #10199 as
part of our long sad history of not testing runc. What I did not
understand at that time is that CONTAINERS_CONF is dangerous:
it does not mean "I will read standard containers.conf and then
override", it means "I will IGNORE standard containers.conf
and use only the settings in this file"! So on Ubuntu we were
losing all the default settings: capabilities, sysctls, all.

Yes, this is documented in containers.conf(5) but it is such
a huge violation of POLA that I need to repeat it.

In #14972, as yet another attempt to fix our runc crisis, I
introduced a new runc-override mechanism: create a custom
/etc/containers/containers.conf when OCI_RUNTIME=runc.
Unlike the CONTAINERS_CONF envariable, the /etc file
actually means what you think it means: "read the default
file first, then override with the /etc file contents".
I.e., we get the desired defaults. But I didn't remember
this helpers.bash workaround, so our runc testing has
actually been flawed: we have not been testing with
the system containers.conf. This commit removes the
no-longer-needed and never-actually-wanted workaround,
and by virtue of testing the cap-drops in kube generate,
we add a regression test to make sure this never happens
again.

It's a little scary that we haven't been testing capabilities.

Also scary: this PR requires python, for converting yaml to json.
I think that should be safe: python3 'import yaml' and 'json'
works fine on a RHEL8.7 VM from 1minutetip.

Signed-off-by: Ed Santiago [email protected]

System tests for podman kube generate

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 17, 2022
@mheon
Copy link
Member

mheon commented Aug 17, 2022

LGTM. The fact that we'll have to update this every time we change generate kube scares me a bit, but CI will ensure the two stay in sync

@edsantiago
Copy link
Member Author

have to update this every time we change generate kube

I was thinking of that as a feature...? kube generate doesn't seem like the sort of thing that should change often, so I was thinking value(detecting accidental changes) > annoyance(having to duplicate). But I really don't know enough here.

@rhatdan
Copy link
Member

rhatdan commented Aug 17, 2022

LGTM
As soon as you get tests to pass.

@edsantiago
Copy link
Member Author

What - I have to get tests to pass, too??

OK srsly this has taken me into all sorts of ratholes. Filed #15367 and am finding more instances of it to report.

Then, sigh, looks like the container order in podman kube generate is nondeterministic. And, on Ubuntu, of course, capabilities are nil. I'll need to figure out how/if to deal with all those.

But this one has me stumped: the org.opencontainers.image.base.name/XXX: localhost/interim-image:latest string shows up only sometimes?? Some f36 passes, some fails? I'm gonna need help here.

@rhatdan
Copy link
Member

rhatdan commented Aug 17, 2022

@umohnani8 PTAL

@edsantiago edsantiago force-pushed the test_kube_generate branch 2 times, most recently from 643055f to 5cfaf92 Compare August 17, 2022 21:59
@edsantiago
Copy link
Member Author

Okay, can someone tell me why ubuntu local (root, rootless) has no capabilities section, but ubuntu remote does?

[+0873s] # #|     FAIL: .spec.containers[0].securityContext.capabilities
[+0873s] # #| expected: = '{}'
[+0873s] # #|   actual:   '{"drop":["CAP_MKNOD","CAP_NET_RAW","CAP_AUDIT_WRITE"]}'

That is: on all f36, I get the drop. On Ubuntu remote, same. On Ubuntu local, nothing.

(It's very very easy for me to just remove that test completely, but I hate doing so without an explanation)

@mheon
Copy link
Member

mheon commented Aug 17, 2022

Any chance there's a containers.conf floating around in the ubuntu VM from another test with dropped caps?

@edsantiago
Copy link
Member Author

Ohhh.... interesting. I think that's it:

# Some CI systems set this to runc, overriding the default crun.
if [[ -n $OCI_RUNTIME ]]; then
if [[ -z $CONTAINERS_CONF ]]; then
# FIXME: BATS provides no mechanism for end-of-run cleanup[1]; how
# can we avoid leaving this file behind when we finish?
# [1] https://github.com/bats-core/bats-core/issues/39
export CONTAINERS_CONF=$(mktemp --tmpdir=${BATS_TMPDIR:-/tmp} podman-bats-XXXXXXX.containers.conf)
cat >$CONTAINERS_CONF <<EOF
[engine]
runtime="$OCI_RUNTIME"
EOF
fi
fi

@edsantiago edsantiago marked this pull request as draft August 18, 2022 00:58
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2022
@edsantiago
Copy link
Member Author

Interesting. If I recreate the capabilities section of containers.conf, ubuntu can't ping:

 # podman run --rm --pod mypod quay.io/libpod/testimage:20220615 ping -c1 127.0.0.1
 PING 127.0.0.1 (127.0.0.1): 56 data bytes
 ping: permission denied (are you root?)
 [ rc=1 (** EXPECTED 0 **) ]

root nor rootless ... but it works perfectly fine with remote??

This exposed a nasty bug in our system-test setup: Ubuntu (runc)
was writing a scratch containers.conf file, and setting CONTAINERS_CONF
to point to it. This was well-intentionedly introduced in containers#10199 as
part of our long sad history of not testing runc. What I did not
understand at that time is that CONTAINERS_CONF is **dangerous**:
it does not mean "I will read standard containers.conf and then
override", it means "I will **IGNORE** standard containers.conf
and use only the settings in this file"! So on Ubuntu we were
losing all the default settings: capabilities, sysctls, all.

Yes, this is documented in containers.conf(5) but it is such
a huge violation of POLA that I need to repeat it.

In containers#14972, as yet another attempt to fix our runc crisis, I
introduced a new runc-override mechanism: create a custom
/etc/containers/containers.conf when OCI_RUNTIME=runc.
Unlike the CONTAINERS_CONF envariable, the /etc file
actually means what you think it means: "read the default
file first, then override with the /etc file contents".
I.e., we get the desired defaults. But I didn't remember
this helpers.bash workaround, so our runc testing has
actually been flawed: we have not been testing with
the system containers.conf. This commit removes the
no-longer-needed and never-actually-wanted workaround,
and by virtue of testing the cap-drops in kube generate,
we add a regression test to make sure this never happens
again.

It's a little scary that we haven't been testing capabilities.

Also scary: this PR requires python, for converting yaml to json.
I think that should be safe: python3 'import yaml' and 'json'
works fine on a RHEL8.7 VM from 1minutetip.

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago edsantiago marked this pull request as ready for review August 18, 2022 16:53
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2022
@edsantiago
Copy link
Member Author

@containers/podman-debbuild-maintainers PTAL. This has been drastically changed since my first attempt. I've updated the commit message and github description accordingly. My first iteration did a line-by-line comparison of the YAML: this is impossible for two reasons:

  1. generate kube seems to output containers in unpredictable order.
  2. On some systems the image has an org.opencontainers.image.base.name annotation, on some it doesn't.

So the new approach converts yaml to json (using python), then uses jq to validate the results. I really hate this but can't think of a better way.

Removing the workaround from helpers.bash seems to be safe.

Thanks @mheon for the CONTAINERS_CONF insight. That was the key.

@mheon
Copy link
Member

mheon commented Aug 18, 2022

LGTM. YAML->JSON sounds horrifying but it doesn't look at that bad in the test itself

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 18, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [edsantiago,giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 31bb53f into containers:main Aug 18, 2022
@edsantiago edsantiago deleted the test_kube_generate branch August 18, 2022 20:59
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants