-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup quota shows up as -1 #8427
Comments
@smarterclayton fyi. i made it a p2 since the exposure is that some builds might be able to exceed an enforced cpu quota. move up if you think it should be higher. |
Is that not a blocker for online? |
i don't know. it could be, i'd say it depends on how frequently it occurs and whether that makes it an avenue for abuse. if 1/100 builds run w/o cpu quota i'm not sure that's a problem? if it's frequent enough that someone could explicitly take advantage of it, maybe it is. hence my putting the question to you :) i wouldn't argue against p1, but i wasn't going to make it one either. |
We have added #8406 to gather additional debugging information. |
@bparees I am not sure if that is available in the link that you posted in this bug as the link doesn't open right now. |
@mrunalp looks like jenkins is down at the moment. and of course the build itself may get cleared at some point. but i put the relevant details from the failure in the description. |
@bparees Were you able to capture the output of /proc/self/cgroup that we added? |
@mrunalp sorry if i confused you, the job i linked was the original case where @smarterclayton hit it, it is not a new incident w/ your debugging added. |
@bparees ack |
@bparees Thanks! I am looking into it. |
Looking into systemd as possible cause per the debug output. |
@bparees Is the image used in the test fedora based? Does it have the |
Created #8635 to collect additional debug information. |
From debug output in #8707, I see that the cgroups mounts are correct ruling out a race there. More likely the value is getting overwritten. I will check with the kernel team.
|
Back up to p1 since we're out of the 1.2 triage window. |
I have opened a kernel bug and will update here once I hear back more on it. |
I just heard back on the kernel bug. There was in fact a race in the kernel around writing these values. Can you check what kernel version do we have right now? |
This is being backported right now. Will updated once new packages are available with the fixes. |
Tagging as a flake so I can link it to PRs with failed merge jobs and have the bot happy. |
This was fixed in the 7.3 kernel, right? |
Yes the fix is in the 7.3 kernel. If we saw the latest instance on 7.3 then I will follow up with kernel team.
|
I was more asking in hopes of being able to close this now :-) |
Oh okay :) maybe close now and we could reopen if necessary?
|
SGTM. |
Will check if the kernel is missing the patch they added earlier.
… On Apr 13, 2017, at 1:33 PM, Clayton Coleman ***@***.***> wrote:
It's baaaaack: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/739/testReport/junit/(root)/Extended/_builds__Conformance__s2i_build_with_a_quota_Building_from_a_template_should_create_an_s2i_build_with_a_quota_and_run_it/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/14128/testReport/(root)/Extended/_builds__Conformance__s2i_build_with_a_quota_Building_from_a_template_should_create_an_s2i_build_with_a_quota_and_run_it/
build knows the quota limit is 6000:
build creates a container with a quota value of 6000:
the container dumps the cgroup filesystem value and sees -1:
The text was updated successfully, but these errors were encountered: