Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup quota shows up as -1 #8427

Closed
bparees opened this issue Apr 8, 2016 · 50 comments
Closed

cgroup quota shows up as -1 #8427

bparees opened this issue Apr 8, 2016 · 50 comments
Assignees
Labels
component/containers kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1

Comments

@bparees
Copy link
Contributor

bparees commented Apr 8, 2016

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/14128/testReport/(root)/Extended/_builds__Conformance__s2i_build_with_a_quota_Building_from_a_template_should_create_an_s2i_build_with_a_quota_and_run_it/

build knows the quota limit is 6000:

 I0407 00:33:34.039640       1 builder.go:145] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:209715200, CPUShares:61, CPUPeriod:100000, CPUQuota:6000, MemorySwap:209715200}

build creates a container with a quota value of 6000:

 Creating container with options {Name:"" Config:&{Hostname: Domainname: User: Memory:0 MemorySwap:0 MemoryReservation:0 KernelMemory:0 CPUShares:0 CPUSet: AttachStdin:false AttachStdout:true AttachStderr:false PortSpecs:[] ExposedPorts:map[] StopSignal: Tty:false OpenStdin:true StdinOnce:true Env:[OPENSHIFT_BUILD_NAME=s2i-build-quota-1 OPENSHIFT_BUILD_NAMESPACE=extended-test-s2i-build-quota-73hkx-7vwau] Cmd:[/bin/sh -c tar -C /tmp -xf - && /tmp/scripts/assemble] DNS:[] Image:centos/ruby-22-centos7:latest Volumes:map[] VolumeDriver: VolumesFrom: WorkingDir: MacAddress: Entrypoint:[] NetworkDisabled:false SecurityOpts:[] OnBuild:[] Mounts:[] Labels:map[]} HostConfig:&{Binds:[] CapAdd:[] CapDrop:[KILL MKNOD SETGID SETUID SYS_CHROOT] GroupAdd:[] ContainerIDFile: LxcConf:[] Privileged:false PortBindings:map[] Links:[] PublishAllPorts:false DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] VolumesFrom:[] NetworkMode:container:b03447dab75ae8295f5acc07474232038a65f0b95de206775138835ba0699138 IpcMode: PidMode: UTSMode: RestartPolicy:{Name: MaximumRetryCount:0} Devices:[] LogConfig:{Type: Config:map[]} ReadonlyRootfs:false SecurityOpt:[] CgroupParent: Memory:209715200 MemorySwap:209715200 MemorySwappiness:0 OOMKillDisable:false CPUShares:61 CPUSet: CPUSetCPUs: CPUSetMEMs: CPUQuota:6000 CPUPeriod:100000 BlkioWeight:0 Ulimits:[] VolumeDriver: OomScoreAdj:0}}

the container dumps the cgroup filesystem value and sees -1:

    I0407 00:33:36.030147       1 sti.go:581] MEMORY=209715200
    I0407 00:33:36.031367       1 sti.go:581] MEMORYSWAP=209715200
    I0407 00:33:36.032448       1 sti.go:581] QUOTA=-1
    I0407 00:33:36.033398       1 sti.go:581] SHARES=61
    I0407 00:33:36.034333       1 sti.go:581] PERIOD=100000
@bparees bparees added kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Apr 8, 2016
@bparees
Copy link
Contributor Author

bparees commented Apr 8, 2016

@smarterclayton fyi. i made it a p2 since the exposure is that some builds might be able to exceed an enforced cpu quota. move up if you think it should be higher.

@smarterclayton
Copy link
Contributor

Is that not a blocker for online?

@bparees
Copy link
Contributor Author

bparees commented Apr 8, 2016

i don't know. it could be, i'd say it depends on how frequently it occurs and whether that makes it an avenue for abuse. if 1/100 builds run w/o cpu quota i'm not sure that's a problem? if it's frequent enough that someone could explicitly take advantage of it, maybe it is. hence my putting the question to you :) i wouldn't argue against p1, but i wasn't going to make it one either.

@mrunalp
Copy link
Member

mrunalp commented Apr 8, 2016

We have added #8406 to gather additional debugging information.

@mrunalp
Copy link
Member

mrunalp commented Apr 8, 2016

@bparees I am not sure if that is available in the link that you posted in this bug as the link doesn't open right now.

@bparees
Copy link
Contributor Author

bparees commented Apr 8, 2016

@mrunalp looks like jenkins is down at the moment. and of course the build itself may get cleared at some point. but i put the relevant details from the failure in the description.

@mrunalp
Copy link
Member

mrunalp commented Apr 8, 2016

@bparees Were you able to capture the output of /proc/self/cgroup that we added?

@bparees
Copy link
Contributor Author

bparees commented Apr 8, 2016

@mrunalp sorry if i confused you, the job i linked was the original case where @smarterclayton hit it, it is not a new incident w/ your debugging added.

@mrunalp
Copy link
Member

mrunalp commented Apr 8, 2016

@bparees ack

@mrunalp
Copy link
Member

mrunalp commented Apr 13, 2016

@bparees Thanks! I am looking into it.

@mrunalp
Copy link
Member

mrunalp commented Apr 19, 2016

Looking into systemd as possible cause per the debug output.

@mrunalp
Copy link
Member

mrunalp commented Apr 26, 2016

@bparees Is the image used in the test fedora based? Does it have the findmnt command in it?
I would like to add it to gather more debug information.

@mrunalp
Copy link
Member

mrunalp commented Apr 26, 2016

Created #8635 to collect additional debug information.

@mrunalp
Copy link
Member

mrunalp commented May 2, 2016

From debug output in #8707, I see that the cgroups mounts are correct ruling out a race there. More likely the value is getting overwritten. I will check with the kernel team.

        I0502 12:51:10.645812       1 docker.go:623] Attaching to container "0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a" ...
        I0502 12:51:10.646351       1 docker.go:632] Starting container "0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a" ...
        I0502 12:51:11.168104       1 docker.go:656] Waiting for container "0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a" to stop ...
        I0502 12:51:11.372743       1 sti.go:583] MEMORY=209715200
        I0502 12:51:11.373543       1 sti.go:583] MEMORYSWAP=209715200
        I0502 12:51:11.374877       1 sti.go:583] QUOTA=-1
        I0502 12:51:11.375617       1 sti.go:583] SHARES=61
        I0502 12:51:11.376372       1 sti.go:583] PERIOD=100000
        I0502 12:51:11.377177       1 sti.go:583] 10:blkio:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377190       1 sti.go:583] 9:devices:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377196       1 sti.go:583] 8:perf_event:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377202       1 sti.go:583] 7:cpuacct,cpu:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377208       1 sti.go:583] 6:hugetlb:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377214       1 sti.go:583] 5:net_cls:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377219       1 sti.go:583] 4:memory:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377225       1 sti.go:583] 3:freezer:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377232       1 sti.go:583] 2:cpuset:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.377238       1 sti.go:583] 1:name=systemd:/system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope
        I0502 12:51:11.378020       1 sti.go:583] 317 98 253:12 /rootfs / rw,relatime - xfs /dev/mapper/docker-202:2-134674389-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
        I0502 12:51:11.378038       1 sti.go:583] 318 317 0:76 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378047       1 sti.go:583] 319 317 0:77 / /dev rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",mode=755
        I0502 12:51:11.378054       1 sti.go:583] 320 319 0:78 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",gid=5,mode=620,ptmxmode=666
        I0502 12:51:11.378061       1 sti.go:583] 321 317 0:66 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro,seclabel
        I0502 12:51:11.378066       1 sti.go:583] 322 321 0:83 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - tmpfs tmpfs ro,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",mode=755
        I0502 12:51:11.378072       1 sti.go:583] 323 322 0:20 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
        I0502 12:51:11.378078       1 sti.go:583] 324 322 0:22 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,cpuset
        I0502 12:51:11.378086       1 sti.go:583] 325 322 0:23 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,freezer
        I0502 12:51:11.378092       1 sti.go:583] 326 322 0:24 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,memory
        I0502 12:51:11.378099       1 sti.go:583] 327 322 0:25 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/net_cls ro,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,net_cls
        I0502 12:51:11.378105       1 sti.go:583] 328 322 0:26 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,hugetlb
        I0502 12:51:11.378111       1 sti.go:583] 330 322 0:27 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/cpuacct,cpu ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,cpuacct,cpu
        I0502 12:51:11.378117       1 sti.go:583] 331 322 0:28 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,perf_event
        I0502 12:51:11.378123       1 sti.go:583] 332 322 0:29 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,devices
        I0502 12:51:11.378129       1 sti.go:583] 333 322 0:30 /system.slice/docker-0699667a8663265ded1e64476b50c471dde27d485d7a18499b33fe3ec75f9a0a.scope /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,blkio
        I0502 12:51:11.378136       1 sti.go:583] 337 317 0:37 / /run/secrets rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839"
        I0502 12:51:11.378142       1 sti.go:583] 338 317 202:2 /var/lib/docker/containers/7ee8d2d25f19e827a2dc66d59605895e9b2198e2316f8632690945e61e94112e/resolv.conf /etc/resolv.conf rw,relatime master:1 - xfs /dev/xvda2 rw,seclabel,attr2,inode64,noquota
        I0502 12:51:11.378148       1 sti.go:583] 339 317 202:2 /var/lib/docker/containers/7ee8d2d25f19e827a2dc66d59605895e9b2198e2316f8632690945e61e94112e/hostname /etc/hostname rw,relatime master:1 - xfs /dev/xvda2 rw,seclabel,attr2,inode64,noquota
        I0502 12:51:11.378155       1 sti.go:583] 340 317 253:2 /pods/739222bd-1064-11e6-b81f-0e832540ffef/etc-hosts /etc/hosts rw,relatime master:27 - xfs /dev/mapper/docker--vg-openshift--xfs--vol--dir rw,seclabel,attr2,inode64,grpquota
        I0502 12:51:11.378161       1 sti.go:583] 341 319 0:38 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",size=65536k
        I0502 12:51:11.378167       1 sti.go:583] 342 319 0:13 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
        I0502 12:51:11.378172       1 sti.go:583] 162 318 0:76 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378178       1 sti.go:583] 170 318 0:76 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378184       1 sti.go:583] 186 318 0:76 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378192       1 sti.go:583] 187 318 0:76 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378198       1 sti.go:583] 199 318 0:76 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
        I0502 12:51:11.378204       1 sti.go:583] 200 318 0:77 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",mode=755
        I0502 12:51:11.378210       1 sti.go:583] 201 318 0:77 /null /proc/timer_stats rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c232,c839",mode=755

@smarterclayton
Copy link
Contributor

Back up to p1 since we're out of the 1.2 triage window.

@mrunalp
Copy link
Member

mrunalp commented May 11, 2016

I have opened a kernel bug and will update here once I hear back more on it.

@bparees
Copy link
Contributor Author

bparees commented Jun 22, 2016

@bparees
Copy link
Contributor Author

bparees commented Jun 30, 2016

@mrunalp
Copy link
Member

mrunalp commented Jul 6, 2016

I just heard back on the kernel bug. There was in fact a race in the kernel around writing these values. Can you check what kernel version do we have right now?

@mrunalp
Copy link
Member

mrunalp commented Aug 11, 2016

This is being backported right now. Will updated once new packages are available with the fixes.

@0xmichalis
Copy link
Contributor

Tagging as a flake so I can link it to PRs with failed merge jobs and have the bot happy.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2016

This was fixed in the 7.3 kernel, right?

@mrunalp
Copy link
Member

mrunalp commented Nov 7, 2016

Yes the fix is in the 7.3 kernel. If we saw the latest instance on 7.3 then I will follow up with kernel team.

On Nov 7, 2016, at 10:46 AM, Andy Goldstein [email protected] wrote:

This was fixed in the 7.3 kernel, right?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2016

I was more asking in hopes of being able to close this now :-)

@mrunalp
Copy link
Member

mrunalp commented Nov 7, 2016

Oh okay :) maybe close now and we could reopen if necessary?

On Nov 7, 2016, at 10:49 AM, Andy Goldstein [email protected] wrote:

I was more asking in hopes of being able to close this now :-)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2016

SGTM.

@ncdc ncdc closed this as completed Nov 7, 2016
@mrunalp
Copy link
Member

mrunalp commented Apr 13, 2017 via email

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 14, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/containers kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1
Projects
None yet
Development

No branches or pull requests

9 participants