Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman --cgroup-manager cgroupfs is taking extraordinary time to complete #3612

Closed
baude opened this issue Jul 21, 2019 · 10 comments
Closed
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@baude
Copy link
Member

baude commented Jul 21, 2019

While working on #3562, all tests are passing except on Ubuntu. I recreated the VM on our CI infra to test the failing tests as well as manually test. It looks like the changes in #3562 have impacted something cgroupfs related as the do not complete.

I will try and slap the debugger on this and see if I can see what is going on if the environment will let me. Perhaps the debugger can enlighten why we are "waiting".

Any theories or pointers from the team are appreciated... @mheon specifically.

Here is an example where I tried to manually add a simple container to an existing pod. The container does get created but simply hangs on the CLI.


# bin/podman --log-level=debug --cgroup-manager cgroupfs run -d --pod foobar docker.io/library/alpine:latest top
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /var/lib/containers/storage 
DEBU[0000] Using run root /var/run/containers/storage   
DEBU[0000] Using static dir /var/lib/containers/storage/libpod 
DEBU[0000] Using tmp dir /var/run/libpod                
DEBU[0000] Using volume path /var/lib/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] cached value indicated that overlay is supported 
DEBU[0000] cached value indicated that metacopy is not being used 
DEBU[0000] cached value indicated that native-diff is usable 
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false 
DEBU[0000] Initializing event backend journald          
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]docker.io/library/alpine:latest" 
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] exporting opaque data as blob "sha256:b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] exporting opaque data as blob "sha256:b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] Got mounts: []                               
DEBU[0000] Got volumes: []                              
DEBU[0000] Using container netmode                      
DEBU[0000] Using container ipcmode                      
DEBU[0000] adding container to pod foobar               
DEBU[0000] created OCI spec and options for new container 
DEBU[0000] Allocated lock 4 for container d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185 
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] exporting opaque data as blob "sha256:b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9" 
DEBU[0000] created container "d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" 
DEBU[0000] container "d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" has work directory "/var/lib/containers/storage/overlay-containers/d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185/userdata" 
DEBU[0000] container "d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" has run directory "/var/run/containers/storage/overlay-containers/d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185/userdata" 
DEBU[0000] New container created "d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" 
DEBU[0000] container "d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" has CgroupParent "/libpod_parent/f1303a56197c69bd511ff3508563388437aa9bbbbf92b1ccbc2d98a329cbe042/libpod-d3986defc4e2e47f07a4613ac5eae6cc1563d0f127d683c23b857006346b1185" 
DEBU[0000] Strongconnecting node 6870acf591989ceb46088892ef6e540c7b3bdaa29c4f7a747152c22e6ea625fd 
DEBU[0000] Pushed 6870acf591989ceb46088892ef6e540c7b3bdaa29c4f7a747152c22e6ea625fd onto stack 
DEBU[0000] Finishing node 6870acf591989ceb46088892ef6e540c7b3bdaa29c4f7a747152c22e6ea625fd. Popped 6870acf591989ceb46088892ef6e540c7b3bdaa29c4f7a747152c22e6ea625fd off stack 
DEBU[0000] Created root filesystem for container 6870acf591989ceb46088892ef6e540c7b3bdaa29c4f7a747152c22e6ea625fd at /var/lib/containers/storage/overlay/32254281e1a4fd2139bdc0c83b5ab351e1442f60dc19bc6087f440341f0ffc75/merged 
@baude
Copy link
Member Author

baude commented Jul 21, 2019

Nothing obvious in journal control. And BTW, I blame something cgroupfs because I think ubuntu is our only test bed for cgroupfs? I must have raised an error or changed the logic of something.

@mheon
Copy link
Member

mheon commented Jul 21, 2019

Ubuntu is also our only set of tests that run without systemd - could be events related? I've been running into major events-related pains in #3551

@mheon
Copy link
Member

mheon commented Jul 21, 2019

(Well, podman-in-podman also technically lacks systemd, but the failure mode there is entirely different - it instantly fails, as opposed to Ubuntu, which seems to attempt to log but fail?)

@baude
Copy link
Member Author

baude commented Jul 21, 2019

Here is where we are hanging:

 0  0x00000000004c19ba in syscall.Syscall6
    at /usr/lib/go-1.12/src/syscall/asm_linux_amd64.s:53
 1  0x00000000004e6b28 in os.(*Process).blockUntilWaitable
    at /usr/lib/go-1.12/src/os/wait_waitid.go:31
 2  0x00000000004de329 in os.(*Process).wait
    at /usr/lib/go-1.12/src/os/exec_unix.go:22
 3  0x0000000000786841 in os.(*Process).Wait
    at /usr/lib/go-1.12/src/os/exec.go:125
 4  0x0000000000786841 in os/exec.(*Cmd).Wait
    at /usr/lib/go-1.12/src/os/exec/exec.go:474
 5  0x00000000011b97a7 in github.com/containers/libpod/pkg/apparmor.InstallDefault
    at pkg/apparmor/apparmor_linux.go:110
 6  0x00000000011bab4c in github.com/containers/libpod/pkg/apparmor.CheckProfileAndLoadDefault
    at pkg/apparmor/apparmor_linux.go:276
 7  0x000000000128927d in github.com/containers/libpod/libpod.(*Container).generateSpec
    at libpod/container_internal_linux.go:200
 8  0x000000000127e1c7 in github.com/containers/libpod/libpod.(*Container).init
    at libpod/container_internal.go:842
 9  0x000000000127ca08 in github.com/containers/libpod/libpod.(*Container).prepareToStart
    at libpod/container_internal.go:676
10  0x0000000001265a57 in github.com/containers/libpod/libpod.(*Container).StartAndAttach
    at libpod/container_api.go:115
11  0x0000000001397a2f in github.com/containers/libpod/pkg/adapter.StartAttachCtr
    at pkg/adapter/terminal_linux.go:80
12  0x0000000001385dab in github.com/containers/libpod/pkg/adapter.(*LocalRuntime).Run
    at pkg/adapter/containers.go:398
13  0x000000000147bd75 in main.runCmd
    at cmd/podman/run.go:59
14  0x000000000149a106 in main.glob..func31
    at cmd/podman/run.go:23
15  0x00000000005f7425 in github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).execute
    at vendor/github.com/spf13/cobra/command.go:826
16  0x00000000005f7ecc in github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    at vendor/github.com/spf13/cobra/command.go:914
17  0x0000000001462a18 in github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).Execute
    at vendor/github.com/spf13/cobra/command.go:864
18  0x0000000001462a18 in main.main
    at cmd/podman/main.go:147
19  0x00000000004455ac in runtime.main
    at /usr/lib/go-1.12/src/runtime/proc.go:200
20  0x0000000000471381 in runtime.goexit
    at /usr/lib/go-1.12/src/runtime/asm_amd64.s:1337

@mheon
Copy link
Member

mheon commented Jul 21, 2019

Apparmor. @vrothberg Are we applying the profile in libpod? I suppose I can see how that might break us...

@baude
Copy link
Member Author

baude commented Jul 21, 2019

but remember here that this worked until I linted everything, so something I did probably broke it

@baude
Copy link
Member Author

baude commented Jul 21, 2019

also interesting, if I kill the hung process and run the same podman command again, it seems to work. also remind me, is this benign?

WARN[0000] Failed to add conmon to cgroupfs sandbox cgroup: write /sys/fs/cgroup/unified/libpod_parent/conmon/tasks: open /sys/fs/cgroup/unified/libpod_parent/conmon/tasks: permission denied 

@mheon
Copy link
Member

mheon commented Jul 21, 2019

Never seen that one before, but it doesn't look healthy. @giuseppe Potential issue with the new CGroups code and ubuntu?

@baude
Copy link
Member Author

baude commented Jul 21, 2019

I think I have a lead on this one ... unfortunately my bandwidth here is so slow, it takes about 1 hour to turn each run around. 33k/sec!

@baude
Copy link
Member Author

baude commented Jul 21, 2019

I have it pinned down.

@baude baude closed this as completed Jul 21, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants