-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for resource limits to play kube #7853
Add support for resource limits to play kube #7853
Conversation
Last commit is missing a signoff. Overall, LGTM. Does |
Could you squash your commits together and sign them. |
Good question. Not according to my quick test. I'll file an issue.
Sure, but I don't think the test is passing yet, at least not my local tests. I'll let the CI run it and see what happens. |
LGTM thanks @xordspar0 |
Hm, it's failing in CI too:
I'll keep working on it. |
the errors
|
Woo, I figured it out! I fixed the error with the help of this thread in the Kubernetes project. The minimum allowed CPU quota is 1 ms. The Kubernetes API gives us an int64 for CPU quota in millicores, but internally in Podman we represent CPU quotas in microseconds, which is what the kernel uses. If I just take 10 millicores and give it to the kernel, it will interpret it as 10 microseconds, which is too small. I added code to do the conversion from millicores to microseconds. |
Is there a way to only run the test in CI environments that have cgroups v2? |
I think |
test/e2e/play_kube_test.go
Outdated
|
||
It("podman play kube allows setting resource limits", func() { | ||
SkipIfContainerized("Resource limits require a running systemd") | ||
SkipIfNotFedora() // Requires cgroups v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SkipIfRootlessCgroupsV1()
Will this test work on a CgroupV1 rootfull environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I have seen the error "invalid configuration, cannot specify resource limits without cgroups v2 and --cgroup-manager=systemd" before, but when I check the code I see that that only applies in rootless mode.
Lines 420 to 438 in 0b7b222
if rootless.IsRootless() { | |
cgroup2, err := cgroups.IsCgroup2UnifiedMode() | |
if err != nil { | |
return nil, err | |
} | |
if !addedResources { | |
configSpec.Linux.Resources = &spec.LinuxResources{} | |
} | |
canUseResources := cgroup2 && runtimeConfig != nil && (runtimeConfig.Engine.CgroupManager == cconfig.SystemdCgroupsManager) | |
if addedResources && !canUseResources { | |
return nil, errors.New("invalid configuration, cannot specify resource limits without cgroups v2 and --cgroup-manager=systemd") | |
} | |
if !canUseResources { | |
// Force the resources block to be empty instead of having default values. | |
configSpec.Linux.Resources = &spec.LinuxResources{} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't understand what happened. When I had only SkipIfContainerized
there were two failures: the rootless cgroups v1 tests. Now, with SkipIfContainerized
and SkipIfRootlessCgroupsV1
, there are 10 CI builds that fail. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to work with Cgroups v1, even with root. All the Cgroups v1 CI builds are hitting this error:
Error: container_linux.go:370: starting container process caused: process_linux.go:338: getting the final child's pid from pipe caused: EOF: OCI runtime error
I'm not sure if that's normal for podman and resource limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limits seem to require root and Cgroups v2. When I test with podman run
on my local machine I get this:
$ bin/podman run -it --rm --cpu-quota 10000 alpine sh
Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: cannot set cpu limit: container could not join or create cgroup
$ sudo bin/podman run -it --rm --cpu-quota 10000 alpine sh
[sudo] password for jordan:
/ #
LGTM once this goes green. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon, xordspar0 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
all kinds of test unhappiness @xordspar0 |
I'm making slow progress on this. The following diff makes the tests pass on my Fedora 33 VM. Both should be valid, so there's still a bug hiding in my code somewhere... diff --git a/test/e2e/play_kube_test.go b/test/e2e/play_kube_test.go
index 01ea53da4..356c00c3e 100644
--- a/test/e2e/play_kube_test.go
+++ b/test/e2e/play_kube_test.go
@@ -1413,8 +1413,8 @@ spec:
numReplicas int32 = 3
expectedCpuRequest string = "100m"
expectedCpuLimit string = "200m"
- expectedMemoryRequest string = "10000"
- expectedMemoryLimit string = "20000"
+ expectedMemoryRequest string = "10000Ki"
+ expectedMemoryLimit string = "20000Ki"
)
expectedCpuQuota := milliCPUToQuota(expectedCpuLimit) |
Acutally, this isn't a bug in my code. What I learned:
Hence the timeouts in some tests. The only thing that was confusing for me was this error:
Maybe crun could provide a better error message? Or maybe there's nothing to be done. I don't know enough about what went wrong, but here's the line where that error message was generated: https://github.com/containers/crun/blob/22c34e34ab5fcdc64b70ea940d1ee5b00b2227a6/src/libcrun/container.c#L360 |
Closer! Now there are only failures on rootless Fedora 31 and 32. I've been testing with rootless Fedora 33. I'll grab an older ISO and continue working there. |
The only remaining test failure is a reported issue: #7959 |
Signed-off-by: Jordan Christiansen <[email protected]>
LGTM |
Woo, the tests are all passing! This took a lot longer than expected, but it's worth it. Engineering is often like that: you peel back a layer of failures only to discover another layer underneath. Thanks for your patience. |
Thanks for putting in the effort to get this passing, @xordspar0 - merging away /lgtm |
🎉 |
Notes: