Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: opening file io.bfq.weight for writing: Permission denied #8737

Closed
cevich opened this issue Dec 15, 2020 · 16 comments
Closed

Error: opening file io.bfq.weight for writing: Permission denied #8737

cevich opened this issue Dec 15, 2020 · 16 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@cevich
Copy link
Member

cevich commented Dec 15, 2020

/kind bug

Description

On Ubuntu 20.10 w/ CgroupsV2 & crun (see #8312) the podman run blkio-weight test fails.

Steps to reproduce the issue:

  1. Boot Ubuntu 20.10 host with kernel option systemd.unified_cgroup_hierarchy=1

  2. Run make localintegration

Describe the results you received:

(Typical failure example)

[+0498s] Podman run 
[+0498s]   podman run blkio-weight test
[+0498s]   /var/tmp/go/src/github.com/containers/podman/test/e2e/run_test.go:481
[+0498s] 
[+0498s] [BeforeEach] Podman run
[+0498s]   /var/tmp/go/src/github.com/containers/podman/test/e2e/run_test.go:28
[+0498s] [It] podman run blkio-weight test
[+0498s]   /var/tmp/go/src/github.com/containers/podman/test/e2e/run_test.go:481
[+0498s] Running: podman ... --blkio-weight=15 ...
[+0498s] Error: opening file `io.bfq.weight` for writing: Permission denied: OCI permission denied

Describe the results you expected:

This test should skip on hosts with kernels lacking the BFQ scheduler (like Ubuntu <= 20.10)

@giuseppe
Copy link
Member

is the io controller available? What is the output for cat /sys/fs/cgroup/cgroup.controllers?

If it is not available, it might be necessary to add something like cgroup_enable=io on the kernel cmdline

@cevich
Copy link
Member Author

cevich commented Jan 5, 2021

Possibly not available, cgroups2 is not the default on Ubuntu, I'm enabling it with a kernel command line. It's easily possible other elements also default to "off". Here's what I get:

root@cevich-ubuntu-c6524344056676352:/# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
root@cevich-ubuntu-c6524344056676352:/# cat /etc/default/grub
...cut...
GRUB_CMDLINE_LINUX=" cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=1"
...cut...
root@cevich-ubuntu-c6524344056676352:/# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.8.0-1008-gcp root=PARTUUID=506f6274-5923-4632-96d5-b96445cc673c ro cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=1 console=ttyS0 panic=-1

@cevich
Copy link
Member Author

cevich commented Jan 5, 2021

Poking around on the web, there seems to be a blkio controller, maybe is that different from io?

@cevich
Copy link
Member Author

cevich commented Jan 5, 2021

Oh, nvm, CGV2 is io and v1 is blkio. Hmm.

@dqminh
Copy link

dqminh commented Feb 2, 2021

@giuseppe @cevich i've also seen this today. This is most likely because bfq is compiled as a modules af (ubuntu does this), hence is not loaded at the time.

If you do modprobe bfq, the file should appear.

@cevich
Copy link
Member Author

cevich commented Feb 3, 2021

If you do modprobe bfq, the file should appear.

I'm not an Ubuntu expert. Is there a way to force that to happen automatically at boot time?

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2021

I would figure there is something like /etc/modules-load.d

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2021

https://www.cyberciti.biz/faq/linux-how-to-load-a-kernel-module-automatically-at-boot-time/
indicates I am correct

"
Please note that if you are using Debian Linux or Ubuntu Linux use the file /etc/modules file instead of /etc/modules.conf (which works on on an older version of Red Hat/Fedora/CentOS Linux. These days it is better to use the directory /etc/modules-load.d/ on all Linux distros.
"

@cevich
Copy link
Member Author

cevich commented Feb 3, 2021

Huh, really? Okay, I would have bet money it wouldn't be that easy...nope, of course it's not. Double-bad-news:

root@cevich-ubuntu-c6524344056676352:~# modprobe bfq
modprobe: FATAL: Module bfq not found in directory /lib/modules/5.8.0-1008-gcp
root@cevich-ubuntu-c6524344056676352:~#

I searched all through that directory, up, down, and sideways. There are no elevator modules of any kind, anywhere, not even deadline. What we get is compiled in 😢

Worse, it seems this is a google-customized kernel, so we can't jam a module in or update from of a package somewhere. We would need to wholesale revert the VM back to a standard distro. kernel. Then, who-knows what would break (obviously, google thought they needed a custom kernel for some important reason) 😭

On the "plus" side...I wonder if this custom Ubuntu kernel is responsible for the ~10% testing slowness myself, @baude and @edsantiago have been head-scratching over for years 😞

@edsantiago
Copy link
Member

"Ten percent"? /me wishes it was ten percent. Typical recent result:

type distro user local remote container
int fedora-32 root 29:17 43:15 30:19
int fedora-33 root 29:32 41:01 30:19
int ubuntu-2004 root 38:58 01:04:58
int ubuntu-2010 root 35:50 48:54
int fedora-32 rootless 30:35
int fedora-33 rootless 32:25
sys fedora-32 root 14:22 16:23
sys fedora-33 root 14:41 09:33
sys ubuntu-2004 root 17:23 25:24
sys ubuntu-2010 root 17:23 14:47
sys fedora-32 rootless 13:55
sys fedora-33 rootless 14:42

@cevich
Copy link
Member Author

cevich commented Feb 3, 2021

Oof! Well it was only 10%...must be that y'all added a few more tests into the mix 😉

OMG, WTF ubuntu-2004 remote, geeze!

Maybe I should try bringing in an upstream Ubuntu cloud image (if there is one) instead of trying to use the GCP "optimized" image. It's possible if there some kind of GCP "enhancement" getting in the way...that would eliminate it and other side-effects, like a missing BFQ elevator.

@cevich
Copy link
Member Author

cevich commented Feb 3, 2021

Note/Clarification: Rejigging our Ubuntu image build workflow is not going to be a fast solution to any issue. For this BFQ/permission denied thing...a more test-local fix/workaround will be much faster (if maybe less desirable).

@cevich
Copy link
Member Author

cevich commented Feb 4, 2021

Ugh...what a complex spiderweb of a problem 😕

@dqminh FWIW, podman CI has also been on the butt-end (twice) of an ugly (load-triggered) BFQ kernel-panic, so we now use 'deadline' everywhere. Perhaps the io.weight conversion (mentioned in runc pr) would help avoid similar bugs (if they exist) in userspace?

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2021

Since runc and crun have patches for this, I am going to close.

@rhatdan rhatdan closed this as completed Feb 4, 2021
@cevich
Copy link
Member Author

cevich commented Feb 4, 2021

N/B: The podman run blkio-weight test is currently disabled on Ubuntu because of this issue. Maybe we should keep this issue open until runc/crun updates are available so we can confirm, and remember to remove the Skip()?

@rhatdan
Copy link
Member

rhatdan commented Feb 5, 2021

Lets just open a PR to remove the skip.

cevich added a commit to cevich/podman that referenced this issue Feb 10, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

6 participants