preserve-fds: file descriptor 4 is not available #15943

edsantiago · 2022-09-26T17:33:23Z

Been seeing this flake for the last week:

$ podman [options] run --preserve-fds 2 quay.io/libpod/alpine:latest
Error: file descriptor 4 is not available - the preserve-fds option requires that file descriptors must be passed

Podman run [It] podman run --preserve-fds invalid fd

ubuntu-2204 : int podman ubuntu-2204 rootless host

Seems interesting that it's only Ubuntu rootless.

The text was updated successfully, but these errors were encountered:

edsantiago · 2022-09-27T13:48:39Z

@containers/podman-maintainers please help. This one is starting to trigger often.

ISTM that something somewhere is now doing an open() without a corresponding close(), leaving fd3 stuck open forever.

The first instance of the flake was seen in PR #15833 (903f551). That doesn't mean this PR is at fault (I see nothing in it that leaks fds), it just means this is the upper bound of commits to search. Given how frequent this flake is, I'm guessing the lower bound is at most two weeks before that, so, let's say 54873c1 (Aug 31). That's impossible for me to review, even if I limit it to git diff 54873c1f5 903f55 -- test/e2e. (But I tried anyway).

What I need is for someone to think back to early September and say "oh, yeah, I added this-or-that code and might have forgotten to close somewhere". Pretty please, can you do this? The longer we wait, the harder it will be to remember.

If we don't get this resolved, I'm just going to remove the preserve-fd test entirely.

mheon · 2022-09-27T15:38:17Z

@edsantiago What's leading you towards that conclusion? It seems like we have too few FDs open from the error message, not too many?

edsantiago · 2022-09-27T15:48:39Z

My belief came from the test failure message:

Expected
   <string>: Error: file descriptor 4 is not available - the preserve-fds option requires that file descriptors must be passed
to contain substring
   <string>: file descriptor 3 is not available

...but I may be reading it completely wrong.

mheon · 2022-09-27T15:52:57Z

Ack, that sounds accurate. An extra FD is getting in occasionally that it altering the error message.

No chance it's leaking in from CI itself passing an extra FD in?

edsantiago · 2022-09-27T15:54:16Z

This suggests that my hypothesis is correct, though:

$  bin/podman run --preserve-fds 2 quay.io/libpod/testimage:20220615 date
Error: file descriptor 3 is not available - the preserve-fds option requires that file descriptors must be passed
$ bin/podman run --preserve-fds 2 quay.io/libpod/testimage:20220615 date 3>/dev/null
Error: file descriptor 4 is not available - the preserve-fds option requires that file descriptors must be passed

I have no idea what CI might be doing. I did mean to add that this happens only on Ubuntu, and (best I can tell) Ubuntu VMs were not touched in the 2 weeks before the flake appeared.

mheon · 2022-09-27T15:55:09Z

Sorry if I wasn't clear, but I concur with your assessment. It's an extra FD, not a missing FD as I originally assumed.

mheon · 2022-09-27T15:57:16Z

I am, however, a bit doubtful that we missed a defer file.Close() somewhere - I seem to recall us having a linter that catches such things. Will verify that after lunch.

edsantiago · 2022-09-27T16:01:22Z

I found one through tedious skimming:

podman/libpod/container_internal_common.go

Line 1086 in 23a3066

statsDirectory, err := os.Open(c.bundlePath())

(two, actually, there are two instances in that same file). But that's podman, not the e2e tests, so I can't imagine why that would affect fds in the e2e test process.

Luap99 · 2022-09-27T20:26:25Z

I am, however, a bit doubtful that we missed a defer file.Close() somewhere - I seem to recall us having a linter that catches such things. Will verify that after lunch.

I don't think there is a linter for that.

We use os/exec to start podman in the tests, it should not matter how many fds we leak in the test since os/exec will always unset all fds.
https://cs.opensource.google/go/go/+/refs/tags/go1.19.1:src/os/exec/exec.go;l=506-517
That makes me think the leak or bug must be somewhere in podman itself.

ALso the golang garbage collector will automatically close files once the variable is out of scope, since the timing changes with every run it would explain why it flakes only sometimes.

edsantiago · 2022-09-29T14:06:47Z

Hmmmm. Here's a new flake I haven't seen before (it isn't even in my flake logs):

Systemd activate 
           stop podman.service
   ....
         Execing /var/tmp/go/src/github.com/containers/podman/bin/podman-remote --url tcp://127.0.0.1:40595 create --tty --name top_XVlBzgba --entrypoint top quay.io/libpod/alpine_labels:latest
         
         Listening on 127.0.0.1:40595 as 3.
         Communication attempt on fd 3.     <<<<<<<===================
         Execing /var/tmp/go/src/github.com/containers/podman/bin/podman (/var/tmp/go/src/github.com/containers/podman/bin/podman --root=/tmp/podman_test1509476081/server_root system service --time=0)
         Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
         Error: unable to connect to Podman socket: Get "http://d/v4.3.0/libpod/_ping": dial tcp 127.0.0.1:40595: connect: connection refused
...
blah blah test fails

Triple-failure, consistent with what we're seeing in this issue. Maybe red herring, but I'll let y'all decide.

edsantiago · 2022-10-05T14:24:13Z

For posterity, because it looks like I'm abandoning this again: thinking that perhaps the root cause was some test that ran before the flaking one, I wrote a script to analyze my logs, looking for tests that precede the invalid fd` one in all cases. No luck: there are 73 such matches. Too many to really evaluate.

Among those, in 11 of the 16 log files, podman run with restart-policy always restarts containers is the test run immediately before the flake. I looked at that test, it doesn't strike me as having anything to do with filehandles.

Moving on.

eriksjolund · 2022-10-15T08:29:32Z

Just an idea:

I found a bug in rootless_linux.c where this PR

rootless: fix return value handling #16188

seems to fix #15927.
Maybe #15943 is related to that bug as well?

I don't how to reproduce this bug so I don't know how to test that.

edsantiago · 2022-10-17T12:31:46Z

I've noticed that some of these flake logs -- not all, but an unexpected number -- are correlated with this one-time flake:

stop podman.service
...
Execing podman-remote --url tcp://127.0.0.1:40357 create --tty --name top_nJObCsNV --entrypoint top quay.io/libpod/alpine_labels:latest
Listening on 127.0.0.1:40357 as 3.
Communication attempt on fd 3.
Execing podman (podman --root=/tmp/podman_test926004581/server_root system service --time=0)
Error: unexpected fd received from systemd: cannot listen on it

Example. It's only a one-time flake, always (AFAICT) passing on the first ginkgo retry.

I don't think that's the cause, because in all cases I see the stop podman.service flake happens after all three preserve-fd failures. It's just odd.

Here's the latest list of Cirrus flakes.

Podman run [It] podman run --preserve-fds invalid fd

ubuntu-2204 : int podman ubuntu-2204 rootless host

github-actions · 2022-11-17T00:07:45Z

A friendly reminder that this issue had no activity for 30 days.

github-actions · 2022-12-19T00:06:30Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-01-03T21:42:33Z

We no longer test Ubuntu. If we ever do so again, and this resurfaces, then I'll reopen.

edsantiago added the flakes Flakes from Continuous Integration label Sep 26, 2022

Luap99 mentioned this issue Oct 5, 2022

DO NOT MERGE: try to figure out the preserve-fds bug #16051

Closed

edsantiago mentioned this issue Oct 24, 2022

stop podman.service: unable to connect to podman socket: dial tcp ECONNREFUSED #16277

Closed

github-actions bot added the stale-issue label Nov 17, 2022

rhatdan removed the stale-issue label Nov 18, 2022

github-actions bot added the stale-issue label Dec 19, 2022

rhatdan removed the stale-issue label Dec 19, 2022

edsantiago closed this as completed Jan 3, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 5, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preserve-fds: file descriptor 4 is not available #15943

preserve-fds: file descriptor 4 is not available #15943

edsantiago commented Sep 26, 2022

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

Luap99 commented Sep 27, 2022

edsantiago commented Sep 29, 2022

edsantiago commented Oct 5, 2022

eriksjolund commented Oct 15, 2022 •

edited

Loading

edsantiago commented Oct 17, 2022

github-actions bot commented Nov 17, 2022

github-actions bot commented Dec 19, 2022

edsantiago commented Jan 3, 2023

preserve-fds: file descriptor 4 is not available #15943

preserve-fds: file descriptor 4 is not available #15943

Comments

edsantiago commented Sep 26, 2022

Podman run [It] podman run --preserve-fds invalid fd

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

mheon commented Sep 27, 2022

mheon commented Sep 27, 2022

edsantiago commented Sep 27, 2022

Luap99 commented Sep 27, 2022

edsantiago commented Sep 29, 2022

edsantiago commented Oct 5, 2022

eriksjolund commented Oct 15, 2022 • edited Loading

edsantiago commented Oct 17, 2022

Podman run [It] podman run --preserve-fds invalid fd

github-actions bot commented Nov 17, 2022

github-actions bot commented Dec 19, 2022

edsantiago commented Jan 3, 2023

eriksjolund commented Oct 15, 2022 •

edited

Loading