-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final v2.0.5 backports #7402
Final v2.0.5 backports #7402
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM |
/hold |
@rhatdan Hold/LGTM this please |
Oh, TYTY @jwhonce |
LGTM |
CI is failing, and we are also waiting on one more c/common revendor from @baude to fix Podman remote on Windows and OS X. |
@edsantiago Poke - could I ask for a moment of your time to take a look at these tests? |
@mheon I can reproduce, it's a real error: looking at logs gives me the supremely unhelpful: {
"cause": "some containers failed",
"message": "error starting some containers: some containers failed",
"response": 500
} |
At this point I'm wondering if it depends on some other part of the test that wasn't backported. Somewhat tempted to disable it. |
Server logs with
|
well... except that AFAICT the test itself has not changed, only the pod code. This implies that, before your PR, that test was passing. |
Damn, you're right. |
881af314d introduces the error. I can't seem to reproduce using |
That landed in a separate PR that passed tests with flying colors... |
@edsantiago Anything from Conmon in syslog? It seems to be doing the thing where it prints the actual error to the logs and doesn't give it to us. |
I see nothing useful via You can very easily reproduce this: $ ./test/apiv2/test-apiv2 40 |
Through selective commenting-out of individual tests, I see that what's causing the error is a #t POST "libpod/pods/bar/restart (restart on nonexistent pod)" '' 404
t POST libpod/pods/create name=bar 201 .Id~[0-9a-f]\\{64\\}
pod_bar_id=$(jq -r .Id <<<"$output")
#t POST libpod/pods/bar/restart '' 200 \
# .Errs=null \
# .Id=$pod_bar_id
#t GET libpod/pods/bar/json 200 \
# .State=Running
t POST libpod/pods/bar/restart '' 200 \
.Errs=null \
.Id=$pod_bar_id
#t POST "libpod/pods/bar/stop?t=invalid" '' 400 \
# .cause="schema: error converting value for \"t\"" \
# .message~"Failed to parse parameters for"
podman run -d --pod bar busybox sleep 999
t POST libpod/pods/bar/stop?t=1 '' 200 \
.Errs=null \
.Id=$pod_bar_id
t POST libpod/pods/bar/start '' 200 sdf If I comment out that "restart", the error goes away. Once again, I cannot reproduce this via command line: $ ./bin/podman-remote pod create --name=foo;./bin/podman-remote pod restart foo;./bin/podman-remote run -d --pod foo busybox sleep 999;./bin/podman-remote pod stop -t 1 foo;./bin/podman-remote pod start foo
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
e00074fa049046a7c9467b09efa237b7fe1a71c794fd13087356fce034bfbcee
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606
4c75633ad2e1e78ccd9d873e5e45c5f336af13699691b40d5cec0ec769cf9606 |
Hmmmm. Investigating. |
Got it! This is the smallest reproducer I can come up with. The key seems to be to $ ./bin/podman-remote pod create --name=foo;./bin/podman-remote pod start foo;./bin/podman-remote pod create --name=bar;./bin/podman-remote run -d --pod bar busybox sleep 999;./bin/podman-remote pod stop -t 1 bar;./bin/podman-remote pod start bar
71e850427778688a4464685cc82874ad2418d1c338a17ce8ff509f13a2e8260a
71e850427778688a4464685cc82874ad2418d1c338a17ce8ff509f13a2e8260a
7449c1cfc2f14c40f476312e73eaab06f919169cd5b2b0c073c5f2d8ee107742
3418123acafdf565cd7bb88c96e60218a3ef26edfa54d5a14d440ebc1c824cd3
7449c1cfc2f14c40f476312e73eaab06f919169cd5b2b0c073c5f2d8ee107742
Error: error starting some containers: some containers failed There may be a smaller or different reproducer, but I think this should get you on the right track. |
because a pod's network information is dictated by the infra container at creation, a container cannot be created with network attributes. this has been difficult for users to understand. we now return an error when a container is being created inside a pod and passes any of the following attributes: * static IP (v4 and v6) * static mac * ports -p (i.e. -p 8080:80) * exposed ports (i.e. 222-225) * publish ports from image -P Signed-off-by: Brent Baude <[email protected]> <MH: Fixed cherry pick conflicts and compile> Signed-off-by: Matthew Heon <[email protected]>
Most Libpod containers are made via `pkg/specgen/generate` which includes code to generate an appropriate exit command which will handle unmounting the container's storage, cleaning up the container's network, etc. There is one notable exception: pod infra containers, which are made entirely within Libpod and do not touch pkg/specgen. As such, no cleanup process, network never cleaned up, bad things can happen. There is good news, though - it's not that difficult to add this, and it's done in this PR. Generally speaking, we don't allow passing options directly to the infra container at create time, but we do (optionally) proxy a pre-approved set of options into it when we create it. Add ExitCommand to these options, and set it at time of pod creation using the same code we use to generate exit commands for normal containers. Fixes containers#7103 Signed-off-by: Matthew Heon <[email protected]> <MH: Fixed cherry-pick conflicts> Signed-off-by: Matthew Heon <[email protected]>
This should help alleviate races where the pod is not fully cleaned up before subsequent API calls happen. Signed-off-by: Matthew Heon <[email protected]>
Really. I promise. No more after this. Signed-off-by: Matthew Heon <[email protected]>
5bad9c3
to
884355c
Compare
Update: the following will also reproduce it: $ ./bin/podman pod create --name=foo;./bin/podman pod start foo;./bin/podman pod create --name=bar;./bin/podman-remote run -d --pod bar busybox sleep 999;./bin/podman-remote pod stop -t 1 bar;./bin/podman-remote pod start bar
cc89c70513e8c93b5f57cba57fd2c869a04e89e3e68cac899bf4c11bbfe1e1a1
cc89c70513e8c93b5f57cba57fd2c869a04e89e3e68cac899bf4c11bbfe1e1a1
84c3ccadb39e23d0dd70384fed896d60bb23d68e33acf7ffe43393a83fce427b
827e76d45de187f17ec67d28508a4e2d2b4758e5d4ef3a54faf7af39e9fd9d71
84c3ccadb39e23d0dd70384fed896d60bb23d68e33acf7ffe43393a83fce427b
Error: error starting some containers: some containers failed (difference is: the first two commands, the ones with There is something about |
Needs containers/storage#698 to fix CI |
We need this release out by end of day, so we don't have time to do this right. Disable the vendor task and manually add c/storage PR containers#698 to the vendored copy of c/storage to make the tests pass. Once containers#698 merges into c/storage, we need to remove this commit and backport it to the v1.20 stable branch, then cut a release there. Signed-off-by: Matthew Heon <[email protected]>
d6c2e92
to
ae2ee65
Compare
Signed-off-by: Matthew Heon <[email protected]>
Signed-off-by: Matthew Heon <[email protected]>
Released containers/storage https://github.com/containers/storage/releases/tag/v1.23.2 |
/lgtm |
Cirrus shows "Failed to stop: Remote host terminated the handshake" on a red background. Probably a flake, although not one that I'm accustomed to. Restarting. |
Lovely. Restarting again. |
Oh yeah, that's the elusive #7148 |
It's happened 3 times in a row now. I think it's escalated from a flake. I'm going to restart this one more time. If it still fails, we need to chase this down tomorrow morning. |
/hold cancel |
These are really the last ones. Definitely. For sure this time.