Final v2.0.5 backports #7402

mheon · 2020-08-21T17:47:34Z

These are really the last ones. Definitely. For sure this time.

openshift-ci-robot · 2020-08-21T17:47:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

baude · 2020-08-21T17:47:54Z

LGTM

jwhonce · 2020-08-21T17:51:44Z

/hold
/lgtm

mheon · 2020-08-21T17:51:46Z

@rhatdan Hold/LGTM this please

mheon · 2020-08-21T17:51:57Z

Oh, TYTY @jwhonce

TomSweeneyRedHat · 2020-08-21T18:17:02Z

LGTM
I sense a nice cold beverage in @mheon's near future....

mheon · 2020-08-21T20:47:54Z

CI is failing, and we are also waiting on one more c/common revendor from @baude to fix Podman remote on Windows and OS X.

mheon · 2020-08-24T13:38:18Z

@edsantiago Poke - could I ask for a moment of your time to take a look at these tests?

edsantiago · 2020-08-24T13:48:54Z

@mheon I can reproduce, it's a real error: looking at logs gives me the supremely unhelpful:

{
  "cause": "some containers failed",
  "message": "error starting some containers: some containers failed",
  "response": 500
}

mheon · 2020-08-24T13:53:49Z

At this point I'm wondering if it depends on some other part of the test that wasn't backported. Somewhat tempted to disable it.

edsantiago · 2020-08-24T13:54:06Z

Server logs with --log-level=debug does not give me any clues either, posting in case it helps you:

time="2020-08-24T07:50:49-06:00" level=debug msg="APIHandler -- Method: POST URL: /v1.40/libpod/pods/bar/start"
time="2020-08-24T07:50:49-06:00" level=debug msg="Strongconnecting node 6af392e59fc0dc894c533504ba775d8fe4a23120a9628f8eb0d7c23e2b01f559"
time="2020-08-24T07:50:49-06:00" level=debug msg="Pushed 6af392e59fc0dc894c533504ba775d8fe4a23120a9628f8eb0d7c23e2b01f559 onto stack"
time="2020-08-24T07:50:49-06:00" level=debug msg="Recursing to successor node 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="Strongconnecting node 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="Pushed 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 onto stack"
time="2020-08-24T07:50:49-06:00" level=debug msg="Finishing node 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2. Popped 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 off stack"
time="2020-08-24T07:50:49-06:00" level=debug msg="Finishing node 6af392e59fc0dc894c533504ba775d8fe4a23120a9628f8eb0d7c23e2b01f559. Popped 6af392e59fc0dc894c533504ba775d8fe4a23120a9628f8eb0d7c23e2b01f559 off stack"
time="2020-08-24T07:50:49-06:00" level=debug msg="mounted container \"7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2\" at \"/dev/shm/test-apiv2.tmp.KfdsUS/overlay/e58e0f3de08a6b1be4818e3f4c5b1d83333ad697196863a0684a7159e539c9df/merged\""
time="2020-08-24T07:50:49-06:00" level=debug msg="Created root filesystem for container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 at /dev/shm/test-apiv2.tmp.KfdsUS/overlay/e58e0f3de08a6b1be4818e3f4c5b1d83333ad697196863a0684a7159e539c9df/merged"
time="2020-08-24T07:50:49-06:00" level=debug msg="Made network namespace at /run/user/14904/netns/cni-6decf55c-1c60-166a-3ad0-065d2c3a696a for container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="slirp4netns command: /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /run/user/14904/netns/cni-6decf55c-1c60-166a-3ad0-065d2c3a696a tap0"
time="2020-08-24T07:50:49-06:00" level=debug msg="/etc/system-fips does not exist on host, not mounting FIPS mode secret"
time="2020-08-24T07:50:49-06:00" level=warning msg="User mount overriding libpod mount at \"/dev/shm\""
time="2020-08-24T07:50:49-06:00" level=debug msg="Setting CGroups for container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 to user-libpod_pod_58022376464d800a078afba95f76124c805933a5e6f5c935d87c7b955f93a616.slice:libpod:7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
time="2020-08-24T07:50:49-06:00" level=debug msg="Created OCI spec for container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 at /dev/shm/test-apiv2.tmp.KfdsUS/overlay-containers/7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2/userdata/config.json"
time="2020-08-24T07:50:49-06:00" level=debug msg="/usr/bin/conmon messages will be logged to syslog"
time="2020-08-24T07:50:49-06:00" level=debug msg="running conmon: /usr/bin/conmon" args="[--api-version 1 -c 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 -u 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2 -r /usr/bin/crun -b /dev/shm/test-apiv2.tmp.KfdsUS/overlay-containers/7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2/userdata -p /run/user/14904/containers/overlay-containers/7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2/userdata/pidfile -n 58022376464d-infra --exit-dir /run/user/14904/libpod/tmp/exits --socket-dir-path /run/user/14904/libpod/tmp/socket -s -l k8s-file:/dev/shm/test-apiv2.tmp.KfdsUS/overlay-containers/7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2/userdata/ctr.log --log-level debug --syslog --conmon-pidfile /run/user/14904/containers/overlay-containers/7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2/userdata/conmon.pid --exit-command /home/esm/src/atomic/2018-02.podman/libpod/bin/podman --exit-command-arg --root --exit-command-arg /dev/shm/test-apiv2.tmp.KfdsUS --exit-command-arg --runroot --exit-command-arg /run/user/14904/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/14904/libpod/tmp --exit-command-arg --runtime --exit-command-arg /usr/bin/crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg true --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2]"
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

time="2020-08-24T07:50:49-06:00" level=debug msg="Received: -1"
time="2020-08-24T07:50:49-06:00" level=debug msg="Cleaning up container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="Tearing down network namespace at /run/user/14904/netns/cni-6decf55c-1c60-166a-3ad0-065d2c3a696a for container 7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2"
time="2020-08-24T07:50:49-06:00" level=debug msg="unmounted container \"7cb597e875dac131e9020f458ac889b8d24ff4e2afe2943db1ff60efc34ab6d2\""
time="2020-08-24T07:50:49-06:00" level=info msg="Request Failed(Internal Server Error): error starting some containers: some containers failed"

edsantiago · 2020-08-24T13:57:41Z

well... except that AFAICT the test itself has not changed, only the pod code. This implies that, before your PR, that test was passing.

mheon · 2020-08-24T14:15:26Z

Damn, you're right.

edsantiago · 2020-08-24T14:15:46Z

881af314d introduces the error. I can't seem to reproduce using podman-remote commands

mheon · 2020-08-24T14:17:36Z

That landed in a separate PR that passed tests with flying colors...

mheon · 2020-08-24T14:20:55Z

@edsantiago Anything from Conmon in syslog? It seems to be doing the thing where it prints the actual error to the logs and doesn't give it to us.

edsantiago · 2020-08-24T14:24:23Z

I see nothing useful via journalctl -f nor tail -f /var/log/messages.

You can very easily reproduce this:

$ ./test/apiv2/test-apiv2 40

edsantiago · 2020-08-24T14:33:37Z

Through selective commenting-out of individual tests, I see that what's causing the error is a restart:

#t POST  "libpod/pods/bar/restart (restart on nonexistent pod)" '' 404
t POST libpod/pods/create name=bar 201 .Id~[0-9a-f]\\{64\\}
pod_bar_id=$(jq -r .Id <<<"$output")

#t POST  libpod/pods/bar/restart '' 200 \
#  .Errs=null \
#  .Id=$pod_bar_id

#t GET  libpod/pods/bar/json        200 \
#  .State=Running

t POST  libpod/pods/bar/restart '' 200 \
  .Errs=null \
  .Id=$pod_bar_id

#t POST "libpod/pods/bar/stop?t=invalid" '' 400 \
#  .cause="schema: error converting value for \"t\"" \
#  .message~"Failed to parse parameters for"

podman run -d --pod bar busybox sleep 999

t POST libpod/pods/bar/stop?t=1 '' 200 \
  .Errs=null \
  .Id=$pod_bar_id

t POST libpod/pods/bar/start '' 200 sdf

If I comment out that "restart", the error goes away.

Once again, I cannot reproduce this via command line:

$ ./bin/podman-remote pod create --name=foo;./bin/podman-remote pod restart foo;./bin/podman-remote run -d --pod foo busybox sleep 999;./bin/podman-remote pod stop -t 1 foo;./bin/podman-remote pod start foo
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
e00074fa049046a7c9467b09efa237b7fe1a71c794fd13087356fce034bfbcee
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606

mheon · 2020-08-24T14:42:15Z

Hmmmm. Investigating.

edsantiago · 2020-08-24T15:24:56Z

Got it! This is the smallest reproducer I can come up with. The key seems to be to start an already-started container:

$ ./bin/podman-remote pod create --name=foo;./bin/podman-remote pod start foo;./bin/podman-remote pod create --name=bar;./bin/podman-remote run -d --pod bar busybox sleep 999;./bin/podman-remote pod stop -t 1 bar;./bin/podman-remote pod start bar
71e850427778688a4464685cc82874ad2418d1c338a17ce8ff509f13a2e8260a
71e850427778688a4464685cc82874ad2418d1c338a17ce8ff509f13a2e8260a
7449c1cfc2f14c40f476312e73eaab06f919169cd5b2b0c073c5f2d8ee107742
3418123acafdf565cd7bb88c96e60218a3ef26edfa54d5a14d440ebc1c824cd3
7449c1cfc2f14c40f476312e73eaab06f919169cd5b2b0c073c5f2d8ee107742
Error: error starting some containers: some containers failed

There may be a smaller or different reproducer, but I think this should get you on the right track.

because a pod's network information is dictated by the infra container at creation, a container cannot be created with network attributes. this has been difficult for users to understand. we now return an error when a container is being created inside a pod and passes any of the following attributes: * static IP (v4 and v6) * static mac * ports -p (i.e. -p 8080:80) * exposed ports (i.e. 222-225) * publish ports from image -P Signed-off-by: Brent Baude <[email protected]> <MH: Fixed cherry pick conflicts and compile> Signed-off-by: Matthew Heon <[email protected]>

Most Libpod containers are made via `pkg/specgen/generate` which includes code to generate an appropriate exit command which will handle unmounting the container's storage, cleaning up the container's network, etc. There is one notable exception: pod infra containers, which are made entirely within Libpod and do not touch pkg/specgen. As such, no cleanup process, network never cleaned up, bad things can happen. There is good news, though - it's not that difficult to add this, and it's done in this PR. Generally speaking, we don't allow passing options directly to the infra container at create time, but we do (optionally) proxy a pre-approved set of options into it when we create it. Add ExitCommand to these options, and set it at time of pod creation using the same code we use to generate exit commands for normal containers. Fixes containers#7103 Signed-off-by: Matthew Heon <[email protected]> <MH: Fixed cherry-pick conflicts> Signed-off-by: Matthew Heon <[email protected]>

This should help alleviate races where the pod is not fully cleaned up before subsequent API calls happen. Signed-off-by: Matthew Heon <[email protected]>

Really. I promise. No more after this. Signed-off-by: Matthew Heon <[email protected]>

edsantiago · 2020-08-24T16:16:10Z

Update: the following will also reproduce it:

$ ./bin/podman pod create --name=foo;./bin/podman pod start foo;./bin/podman pod create --name=bar;./bin/podman-remote run -d --pod bar busybox sleep 999;./bin/podman-remote pod stop -t 1 bar;./bin/podman-remote pod start bar
cc89c70513e8c93b5f57cba57fd2c869a04e89e3e68cac899bf4c11bbfe1e1a1
cc89c70513e8c93b5f57cba57fd2c869a04e89e3e68cac899bf4c11bbfe1e1a1
84c3ccadb39e23d0dd70384fed896d60bb23d68e33acf7ffe43393a83fce427b
827e76d45de187f17ec67d28508a4e2d2b4758e5d4ef3a54faf7af39e9fd9d71
84c3ccadb39e23d0dd70384fed896d60bb23d68e33acf7ffe43393a83fce427b
Error: error starting some containers: some containers failed

(difference is: the first two commands, the ones with foo, are podman only, not podman-remote).

There is something about podman pod start already-running-pod that is leaving bad state behind somewhere.

mheon · 2020-08-24T18:58:06Z

Needs containers/storage#698 to fix CI

We need this release out by end of day, so we don't have time to do this right. Disable the vendor task and manually add c/storage PR containers#698 to the vendored copy of c/storage to make the tests pass. Once containers#698 merges into c/storage, we need to remove this commit and backport it to the v1.20 stable branch, then cut a release there. Signed-off-by: Matthew Heon <[email protected]>

Signed-off-by: Matthew Heon <[email protected]>

rhatdan · 2020-08-24T19:27:43Z

Released containers/storage https://github.com/containers/storage/releases/tag/v1.23.2

baude · 2020-08-24T19:27:46Z

/lgtm

edsantiago · 2020-08-24T20:18:43Z

Cirrus shows "Failed to stop: Remote host terminated the handshake" on a red background. Probably a flake, although not one that I'm accustomed to. Restarting.

mheon · 2020-08-24T20:55:11Z

[+0256s] # error opening file /sys/fs/cgroup//system.slice/buildah-buildah651017202-477311.scope/cgroup.freeze: No such file or directory

Lovely. Restarting again.

edsantiago · 2020-08-24T21:01:46Z

Oh yeah, that's the elusive #7148

mheon · 2020-08-24T21:34:46Z

It's happened 3 times in a row now. I think it's escalated from a flake.

I'm going to restart this one more time. If it still fails, we need to chase this down tomorrow morning.

mheon · 2020-08-24T22:31:15Z

/hold cancel

openshift-ci-robot requested review from jwhonce and umohnani8 August 21, 2020 17:47

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2020

openshift-ci-robot assigned jwhonce Aug 21, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2020

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2020

baude and others added 3 commits August 24, 2020 11:31

Clean up pods before returning from Pod Stop API call

c572378

This should help alleviate races where the pod is not fully cleaned up before subsequent API calls happen. Signed-off-by: Matthew Heon <[email protected]>

Final release notes update for v2.0.5.

884355c

Really. I promise. No more after this. Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the last_pr_before_205_really_this_time branch from 5bad9c3 to 884355c Compare August 24, 2020 15:36

mheon force-pushed the last_pr_before_205_really_this_time branch from d6c2e92 to ae2ee65 Compare August 24, 2020 19:14

mheon added 2 commits August 24, 2020 15:18

Bump to v2.0.5

776abc5

Signed-off-by: Matthew Heon <[email protected]>

Bump to v2.0.6-dev

13d5b2d

Signed-off-by: Matthew Heon <[email protected]>

openshift-ci-robot assigned baude Aug 24, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2020

openshift-merge-robot merged commit 024f470 into containers:v2.0 Aug 24, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final v2.0.5 backports #7402

Final v2.0.5 backports #7402

mheon commented Aug 21, 2020

openshift-ci-robot commented Aug 21, 2020

baude commented Aug 21, 2020

jwhonce commented Aug 21, 2020

mheon commented Aug 21, 2020

mheon commented Aug 21, 2020

TomSweeneyRedHat commented Aug 21, 2020

mheon commented Aug 21, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

rhatdan commented Aug 24, 2020

baude commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

mheon commented Aug 24, 2020

Final v2.0.5 backports #7402

Final v2.0.5 backports #7402

Conversation

mheon commented Aug 21, 2020

openshift-ci-robot commented Aug 21, 2020

baude commented Aug 21, 2020

jwhonce commented Aug 21, 2020

mheon commented Aug 21, 2020

mheon commented Aug 21, 2020

TomSweeneyRedHat commented Aug 21, 2020

mheon commented Aug 21, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

rhatdan commented Aug 24, 2020

baude commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

edsantiago commented Aug 24, 2020

mheon commented Aug 24, 2020

mheon commented Aug 24, 2020