Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.2.1 (on Gentoo): Error: error configuring CNI network plugin: failed to create new watcher too many open files #10686

Closed
edsantiago opened this issue Jun 15, 2021 · 14 comments · Fixed by #10741
Assignees
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature rootless

Comments

@edsantiago
Copy link
Member

rootless podman v3.2.1 fails on my Gentoo laptop, no matter what I run:

$ podman info
Error: error configuring CNI network plugin: failed to create new watcher too many open files

It fails very early: not even podman info nor podman system reset get past this point. strace shows a smoking gun here:

newfstatat(AT_FDCWD, "/home/esm/.config/cni/net.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
inotify_init1(IN_CLOEXEC)               = -1 EMFILE (Too many open files)

Per watercooler discussion 2021-06-15 the source of the error is in CNI plugins. The root cause is almost certainly something on my end, possibly my use of syncthing (which also periodically gripes with inotify-related errors). I'm not asking the podman team to spend much time on this, my goal is to file a placeholder to help others who run into this. (For now, the only solution is to revert to 3.1.2).

@Luap99
Copy link
Member

Luap99 commented Jun 15, 2021

Does it work rootful?

@edsantiago
Copy link
Member Author

Yes - I tagged with the rootless label in hopes of making that clear, but next time I will be more explicit.

@Luap99
Copy link
Member

Luap99 commented Jun 15, 2021

What is the value in /proc/sys/fs/inotify/max_user_instances?

@edsantiago
Copy link
Member Author

fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 204800

@Luap99
Copy link
Member

Luap99 commented Jun 16, 2021

Maybe the best short term fix is to ignore this error. Only cni needs this, it should only fail when you try to use cni, e.g. podman run --network mynet... and not for each podman command.
In the long run we should get rid of the inotify requirement.

@mheon
Copy link
Member

mheon commented Jun 16, 2021

EMFILE here is, per the manpage, that the maximum number of user inotify sessions has been reached. I'm assuming you don't have 128 processes using inotify running, so this seems somewhat suspicious to me.

@edsantiago
Copy link
Member Author

I'm assuming you don't have 128 processes using inotify running

Not necessarily a safe assumption. I've had problems with this system for years, something to do with dbus (which I still have no idea what it is apart from something to irritate us oldtimers). There were 391 "dbus-daemon" processes running, and lsof on a sampling of those showed inotify active. Killing them fixes the EMFILE problem, allowing me to run podman images, but then runs into the "image not known" bug (#10682) so I've reverted yet again to 3.1.2

@Luap99
Copy link
Member

Luap99 commented Jun 16, 2021

I think this might be a bigger issue. The problem is that you cannot use more than 128 podman processes concurrently without failing. I don't know if there are any users who would reach this but I think 128 is not much. It is also potentially dangerous when podman container cleanup processes start failing.

@mheon
Copy link
Member

mheon commented Jun 16, 2021

I think we need to add a config knob to CNI so we can turn this off everywhere that is not podman system service.

@Luap99
Copy link
Member

Luap99 commented Jun 16, 2021

I think we need to add a config knob to CNI so we can turn this off everywhere that is not podman system service.

Actually we never need inotify, even for the service. I remember changing OCICNI to always reload the networks from disk when you use one which was not found in memory. This was needed because the podman CI flaked to often since inotify was slower than podman-remote network create ... && podman-remote run --network ...

@mheon
Copy link
Member

mheon commented Jun 16, 2021

Nice. Love it when things are simple.

@Luap99 Luap99 self-assigned this Jun 16, 2021
Luap99 added a commit to Luap99/ocicni that referenced this issue Jun 16, 2021
Add a new InitCNINoInotify function to allow the use of OCICNI without
the use of inotify.

For some workloads it is not required to watch the cni config directory.
With podman v3.2 we started using OCICNI for rootless users as well.
However the use of inotify is restricted by sysctl values
(fs.inotify.max_user_instances and fs.inotify.max_user_watches).
By default only 128 processes can use inotify.

Since this limit is easy to reach and inotify is not required for our
usecase it would be great to have this option to disable it.

see containers/podman#10686

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/ocicni that referenced this issue Jun 16, 2021
Add a new InitCNINoInotify function to allow the use of OCICNI without
the use of inotify.

For some workloads it is not required to watch the cni config directory.
With podman v3.2 we started using OCICNI for rootless users as well.
However the use of inotify is restricted by sysctl values
(fs.inotify.max_user_instances and fs.inotify.max_user_watches).
By default only 128 processes can use inotify.

Since this limit is easy to reach and inotify is not required for our
use case it would be great to have this option to disable it.

see containers/podman#10686

Signed-off-by: Paul Holzinger <[email protected]>
@edsantiago
Copy link
Member Author

you cannot use more than 128 podman processes concurrently

I got curious, and tested:

$ for i in {1..130};do echo -n $i..;./bin/podman run -d --rm quay.io/libpod/testimage:20210610 sleep 120 >/dev/null;done;echo
1..2..3..4..5..6..7..8..9..10..11..12..13..14..15..16..17..18..19..20..21..22..23..24..25..26..27..28..29..30..31..32..33..34..35..36..37..38..39..40..41..42..43..44..45..46..47..48..49..50..51..52..53..54..55..56..57..58..59..60..61..62..63..64..65..66..67..68..69..70..71..72..73..74..75..76..77..78..79..80..81..82..83..84..85..86..87..88..89..90..91..92..93..94..95..96..97..98..99..100..101..102..103..104..105..106..107..108..109..110..111..112..113..114..115..116..117..118..119..120..121..122..123..124..125..126..127..128..129..130..
$ ./bin/podman ps |wc -l
131
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128

To me, that shows 130 concurrent user processes, none of them failing? What did I do wrong?

@Luap99
Copy link
Member

Luap99 commented Jun 16, 2021

You run the container detached, so podman already exited. If you run them attached it fails.

@edsantiago
Copy link
Member Author

Oh - yeah, wow, that sure reproduces it! Thank you!

Luap99 added a commit to Luap99/ocicni that referenced this issue Jun 18, 2021
Add a new InitCNINoInotify function to allow the use of OCICNI without
the use of inotify.

For some workloads it is not required to watch the cni config directory.
With podman v3.2 we started using OCICNI for rootless users as well.
However the use of inotify is restricted by sysctl values
(fs.inotify.max_user_instances and fs.inotify.max_user_watches).
By default only 128 processes can use inotify.

Since this limit is easy to reach and inotify is not required for our
use case it would be great to have this option to disable it.

see containers/podman#10686

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 added the network Networking related issue or feature label Jun 21, 2021
Luap99 added a commit to Luap99/libpod that referenced this issue Jun 22, 2021
Podman does not need to watch the cni config directory. If a network is
not found in the cache, OCICNI will reload the networks anyway and thus
even podman system service should work as expected.
Also include a change to not mount a "new" /var by default in the
rootless cni ns, instead try to use /var/lib/cni first and then the
parent dir. This allows users to store cni configs under /var/... which
is the case for the CI compose test.

[NO TESTS NEEDED]

Fixes containers#10686

Signed-off-by: Paul Holzinger <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Jun 24, 2021
Podman does not need to watch the cni config directory. If a network is
not found in the cache, OCICNI will reload the networks anyway and thus
even podman system service should work as expected.
Also include a change to not mount a "new" /var by default in the
rootless cni ns, instead try to use /var/lib/cni first and then the
parent dir. This allows users to store cni configs under /var/... which
is the case for the CI compose test.

[NO TESTS NEEDED]

Fixes containers#10686

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature rootless
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants