Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: try to figure out the preserve-fds bug #16051

Closed
wants to merge 1 commit into from

Conversation

edsantiago
Copy link
Member

Add "ls -l /proc/self/fd" before running the test.
Might be useful to also run "lsof -p $$", what's the
right Go incantation for getting my own pid?

Signed-off-by: Ed Santiago [email protected]

none

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 4, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2022
@edsantiago edsantiago marked this pull request as draft October 4, 2022 20:25
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2022
@mheon
Copy link
Member

mheon commented Oct 4, 2022

@edsantiago https://pkg.go.dev/os#Getpid

@edsantiago
Copy link
Member Author

Yay, the flake triggered! Boo, there are no extra open fds. Unless exec.Command() closes fds??? That would be weird. I'll continue playing with this tomorrow.

@edsantiago edsantiago force-pushed the who_is_hogging_fd3 branch 3 times, most recently from cbac6e5 to 91b250a Compare October 4, 2022 23:38
@edsantiago
Copy link
Member Author

Sigh. Failed, with both ls and lsof results... but I'm darned if I can figure out a useful difference between the failed ubuntu log and the passing one on fedora.

@Luap99
Copy link
Member

Luap99 commented Oct 5, 2022

@edsantiago Yes exec.Command closes all fds, see my comment on the issue:

We use os/exec to start podman in the tests, it should not matter how many fds we leak in the test since os/exec will always unset all fds.
https://cs.opensource.google/go/go/+/refs/tags/go1.19.1:src/os/exec/exec.go;l=506-517
That makes me think the leak or bug must be somewhere in podman itself.

#15943 (comment)

It does not matter how many open fds the test has, in fact we cannot control this at all since they run in parallel.
So either the golang lib has a bug and does not close the fds in all cases or we have a bug in podman.

@edsantiago
Copy link
Member Author

Ugh, you're right. I guess closing fds is user-friendly in some respects, it's just counterintuitive for someone who grew up on UNIX and C. Thanks for the reminder.

Do you have any suggestions on how else to track this down? Since this is consistently ubuntu-only, could it be something that ubuntu changed in the Go libraries, as a distro patch? Or a too-old version of Go?

Again, this is easy to fix in the test, I just don't want to hide a real bug.

@Luap99
Copy link
Member

Luap99 commented Oct 5, 2022

I don't really know, all I see is that we have some custom c code which checks the fds:

/* Store how many FDs were open before the Go runtime kicked in. */
d = opendir ("/proc/self/fd");
if (d)
{
struct dirent *ent;
size_t size = 0;
for (ent = readdir (d); ent; ent = readdir (d))
{
int fd;
if (ent->d_name[0] == '.')
continue;
fd = atoi (ent->d_name);
if (fd == dirfd (d))
continue;
if (fd >= size * FD_SETSIZE)
{
int i;
size_t new_size;
new_size = (fd / FD_SETSIZE) + 1;
open_files_set = realloc (open_files_set, new_size * sizeof (fd_set));
if (open_files_set == NULL)
_exit (EXIT_FAILURE);
for (i = size; i < new_size; i++)
FD_ZERO (&(open_files_set[i]));
size = new_size;
}
if (fd > open_files_max_fd)
open_files_max_fd = fd;
FD_SET (fd % FD_SETSIZE, &(open_files_set[fd / FD_SETSIZE]));
}
}

int
is_fd_inherited(int fd)
{
if (open_files_set == NULL || fd > open_files_max_fd || fd < 0)
return 0;
return FD_ISSET(fd % FD_SETSIZE, &(open_files_set[fd / FD_SETSIZE])) ? 1 : 0;
}

@edsantiago
Copy link
Member Author

I notice there's no matching closedir() for the opendir() on line 300. Should there be?

Add "ls -l /proc/self/fd" before running the test.
Might be useful to also run "lsof -p $$", what's the
right Go incantation for getting my own pid?

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago
Copy link
Member Author

Unable to reproduce the flake if I use -focus on the ginkgo tests.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2022
@openshift-merge-robot
Copy link
Collaborator

@edsantiago: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 1, 2022

@edsantiago is this still valid, needs a rebase.

@github-actions github-actions bot removed the stale-pr label Dec 2, 2022
@edsantiago
Copy link
Member Author

This is an Ubuntu-only flake. We haven't run CI on Ubuntu in a month, so I'll just close this and reopen if we ever get Ubuntu back.

@edsantiago edsantiago closed this Dec 2, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 19, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants