-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-35911: E2E: Add test to verify runc process excludes the cpus used by pod. #1088
base: master
Are you sure you want to change the base?
Conversation
@SargunNarula: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@SargunNarula: This pull request references Jira Issue OCPBUGS-35911, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@SargunNarula: This pull request references Jira Issue OCPBUGS-35911, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
06dcbf0
to
33fd5ac
Compare
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I added few initial comments below
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
1fa3baa
to
e983f94
Compare
83fa93a
to
c6d89e5
Compare
looks good to me from my side. |
c6d89e5
to
4007668
Compare
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
02e4f19
to
73a257b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the testing logic can be maybe simplified, but no major objections it seems
questions and possible improvements inside
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
083325f
to
10d3c35
Compare
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this tests checks the correct thing. We do check a BE pod has no overlap with CPUs exclusively assigned to a Guaranteed pod, but the problem here is not what happens at runtime, but what happened at pod creation time. Once the pod goes running, runc
is terminated, and there's no trace of where did it run
@ffromani The original issue identified was that when launching a guaranteed pod running a cyclic test, the runc container creation process was observed to be running on isolated CPUs. This process inadvertently utilized the CPUs allocated to the cyclic test. The resolution involved ensuring that the cpuset.cpus configuration is passed during container creation. Additionally, since runc follows a two-step creation process, the initialization process (executed as /usr/bin/pod, which is a symlink to /usr/bin/runc) is started within a container. This container is assigned the cpuset.cpus values. This behavior can be confirmed by examining the config.json of the initialization container to verify that the appropriate CPU allocation is applied, reserved CPUs in the case of a guaranteed pod, or all available CPUs in the case of a Best-Effort (BE) pod. Reference: Based on these observations, the current patch may not effectively validate this scenario. I will work on a revised patch to accurately verify the CPUs being utilized. |
10d3c35
to
aa8eccc
Compare
6d925ba
to
4e72640
Compare
/retest |
var guaranteedPodCpus, guaranteedInitPodCpus cpuset.CPUSet | ||
var bestEffortPodCpus, bestEffortInitPodCpus cpuset.CPUSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we ever use or need init containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init containers here refer to the runc
container used to initialize and deploy the main pod container.
return &config, nil | ||
} | ||
|
||
func getConfigJsonInfo(pod *corev1.Pod, containerName string, workerRTNode *corev1.Node) []*ContainerConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the best we can do but at the same time AFAIK this is not sufficient to ensure that runc
(or crun
) never actually run on any isolated CPU.
If, as IIRC, the goal is to test that crun
(or runc
) never run on isolated CPU, then this test helps but is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach to testing is to verify whether runc
/crun
isolates itself from the isolated CPUs by following the method defined in the OCI specification, which involves generating a config.json
and adhering to the specified CPUs based on the applied profile.
Alternatively, a tracing tool like bpftrace
can be used to monitor all runc
calls and inspect their CPU affinity. However, this method is quite invasive and may be challenging to implement IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach to testing is to verify whether
runc
/crun
isolates itself from the isolated CPUs by following the method defined in the OCI specification, which involves generating aconfig.json
and adhering to the specified CPUs based on the applied profile.
Well, does the process which writes config.json
run only on reserved CPUs? And does crun
(or runc
) honor the settings on config.json
when setting its own affinity? IOW, who decides and who enforces that runc
(or crun
) only runs on reserved CPUs?
your test seems fine for workload, but there are open questions for the infra proper
Alternatively, a tracing tool like
bpftrace
can be used to monitor allrunc
calls and inspect their CPU affinity. However, this method is quite invasive and may be challenging to implement IMO.
I agree, but the problem here is first and foremost testing the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani Do you have any other way in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure the check is robust, I have added another Expect
to verify whether it honors the provided CPUs through the profile. You can review the change here: Line The latest commit addresses your concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/e2e/performanceprofile/functests/1_performance/cpu_management.go
Outdated
Show resolved
Hide resolved
4e72640
to
0b8046c
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SargunNarula The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adding a test to verify that runc does not use CPUs assigned to guaranteed pods. Signed-off-by: Sargun Narula <[email protected]> Updated cpu checking as per container, runc will provide config.json to each type of pod But runc will have its own container always using reserved cpus Signed-off-by: Sargun Narula <[email protected]>
0b8046c
to
143f831
Compare
@SargunNarula: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Adding a test to verify that runc does not use CPUs assigned to guaranteed pods.
Original bug link - https://bugzilla.redhat.com/show_bug.cgi?id=1910386