Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: display online_cpus in compat REST API #15867

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

boaz0
Copy link
Collaborator

@boaz0 boaz0 commented Sep 20, 2022

closes #15754

Does this PR introduce a user-facing change?

None

@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Sep 20, 2022
@zhangguanzhang
Copy link
Collaborator

PR does not include changes in the 'tests' directory

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please add a test and a description of what the change fixes.

@baude
Copy link
Member

baude commented Sep 27, 2022

@boaz0 could you please add a test for your PR? It is failing because no test was added.

@baude
Copy link
Member

baude commented Sep 27, 2022

and also a proper commit message

@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 27, 2022

Sure, I'll be working on it today and tomorrow. Sorry for the late response.

@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 29, 2022

@edsantiago can you review the test I added.

  1. I wasn't sure if to create a new bats file only for this use case
  2. Maybe there are better ways to test this.

@edsantiago
Copy link
Member

@boaz0 this looks like it belongs in test/apiv2; see the README.md file there, and other *.at files for examples.

There's a big red kind/api-change button on this PR.

@edsantiago
Copy link
Member

@containers/podman-maintainers CI is green. Test LGTM (fails on main, passes with this PR). I will remind everyone that this has a big red api-change label.

Copy link
Member

@ashley-cui ashley-cui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like online_cpus are now displayed! I don't know how to confirm that the number is correct though, since it looks like it always displays 0?

@benoitf does this look correct? Do we have an example where we expect online_cpus to be non-zero?

curl --unix-socket /run/user/1000/podman/podman.sock http:/v1.41/containers/$myContainer/stats?stream=false | jq .cpu_stats
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1105    0  1105    0     0  73829      0 --:--:-- --:--:-- --:--:-- 78928
{
  "cpu_usage": {
    "total_usage": 67664000,
    "usage_in_kernelmode": 50612,
    "usage_in_usermode": 67613388
  },
  "system_cpu_usage": 53967013582000,
  "online_cpus": 0,
  "cpu": 0,
  "throttling_data": {
    "periods": 0,
    "throttled_periods": 0,
    "throttled_time": 0
  }
}

@rhatdan
Copy link
Member

rhatdan commented Oct 6, 2022

@Luap99 @mheon @ashley-cui Does this break API compatibility?

@benoitf
Copy link
Contributor

benoitf commented Oct 7, 2022

@ashley-cui it should return the number of CPUs of the machine for example

Here is the output with docker if the 'docker vm' has 3 CPUs

curl --silent --unix-socket /var/run/docker.sock "http:/v1.41/containers/$myContainer/stats?stream=false"  | jq .cpu_stats
{
  "cpu_usage": {
    "total_usage": 126467000,
    "usage_in_kernelmode": 72702000,
    "usage_in_usermode": 53764000
  },
  "system_cpu_usage": 90890360000000,
  "online_cpus": 3,
  "throttling_data": {
    "periods": 0,
    "throttled_periods": 0,
    "throttled_time": 0
  }
}

"online_cpus": 3,

by default, podman machine have 2 cpus assigned so it should return 2 instead of 0

@ashley-cui
Copy link
Member

@rhatdan It shouldn't, as this just shows online_cpus: 0 rather than hiding it if the CPUs is zero.

Though, if I understand @benoitf 's comment correctly, it looks like the CPU's should be non-zero, and we're still displaying 0 CPUs so I'm not sure if this PR correctly addresses the problem.

@boaz0
Copy link
Collaborator Author

boaz0 commented Oct 8, 2022

@benoitf @ashley-cui the number of CPUs is read from the cgroup file inside the container. I can look more and see what went wrong.

@ashley-cui
Copy link
Member

Thanks @boaz0!

@ashley-cui
Copy link
Member

@giuseppe Would you happen to know why the CPUs is always displayed as 0?

@flouthoc
Copy link
Collaborator

I think this might need cpuacct enabled but to calculate number of available CPU should we just parse cpu.max for nested cgroup container and do $MAX/$PERIOD where $MAX and $PERIOD are parsed from cpu.max. But @giuseppe can confirm this better.

@lukasmrtvy
Copy link

lukasmrtvy commented Oct 18, 2022

@flouthoc cpuacct controller isnt available for cgroups v2

@flouthoc
Copy link
Collaborator

@flouthoc cpuacct controller isnt available for cgroups v2

@lukasmrtvy Yes my suggestion for v2 and where cpuacct is not available was to compute from cpu.max i.e $MAX/$PERIOD but I could be wrong with this assumption so I'd wait for others to confirm.

@rhatdan
Copy link
Member

rhatdan commented Oct 18, 2022

@giuseppe PTAL ^^

@giuseppe
Copy link
Member

Could we use sched_getaffinity(2) to calculate the number of online CPUs?

Something like:

package main

import (
	"fmt"
	"golang.org/x/sys/unix"
)

func getNumOnlineCPUs() (int, error) {
	var cpuSet unix.CPUSet
	if err := unix.SchedGetaffinity(0, &cpuSet); err != nil {
		return -1, err
	}
	return cpuSet.Count(), nil
}

func main(){
	n, err := getNumOnlineCPUs()
	if err != nil {
		panic(err)
	}
	fmt.Printf("Online CPUs: %d\n", n)
}

with the code above I have:

$ ./online_cpus 
Online CPUs: 8

$ taskset -c 1,2 ./online_cpus 
Online CPUs: 2

$ taskset -c 1 ./online_cpus 
Online CPUs: 1

It doesn't require any privilege to run

@boaz0
Copy link
Collaborator Author

boaz0 commented Oct 21, 2022

@giuseppe I am probably wrong but

./bin/podman run --cpus 2 busybox sh -c 'while true;do sleep 1s; echo "hello";done'

tells me I have 8 online_cpus, I was hoping to have 2.
I also tried

./bin/podman run --cpuset-cpus 0,3 busybox sh -c 'while true;do sleep 1s; echo "hello";done'

but got the same results.

This is the code:

func StatsContainer(w http.ResponseWriter, r *http.Request) {
...
		onlineCPUs, err := libpod.GetOnlineCPUs(0)
		if err != nil {
			logrus.Errorf("Unable to get online cpus: %v", err)
		}
}

func GetOnlineCPUs(pid uint64) (int, error) {
	var cpuSet unix.CPUSet
	if err := unix.SchedGetaffinity(int(pid), &cpuSet); err != nil {
		return -1, err
	}
	return cpuSet.Count(), nil
}
``

@giuseppe
Copy link
Member

you'll need to replace the PID here: onlineCPUs, err := libpod.GetOnlineCPUs(0) from 0 to the PID of the container process

@boaz0
Copy link
Collaborator Author

boaz0 commented May 16, 2023

@rhatdan @vrothberg can you run CI? I did some changes, I hope it will make CI green again.
Thanks.

@vrothberg
Copy link
Member

Tests aren't happy yet

@boaz0
Copy link
Collaborator Author

boaz0 commented May 16, 2023

@giuseppe I am trying to set --cpuset-cpus=0 for testing the API but it gives me this error message:

Error: OCI runtime error: crun: the requested cgroup controller `cpuset` is not available

Do you have an idea what I am doing wrong?

@edsantiago
Copy link
Member

--cpuset requires root. See examples in 20-containers.at for running tests only when root.

@boaz0
Copy link
Collaborator Author

boaz0 commented May 18, 2023

Thanks @edsantiago

@rhatdan
Copy link
Member

rhatdan commented May 18, 2023

@edsantiago
Copy link
Member

LGTM but I'm not comfortable approving it. I did confirm that the new test fails on main (i.e. that it tests the new functionality).

@mheon
Copy link
Member

mheon commented May 18, 2023

One comment from me, otherwise LGTM

@boaz0
Copy link
Collaborator Author

boaz0 commented May 30, 2023

Thanks @mheon. Updated the code.

@edsantiago
Copy link
Member

@boaz0 please squash your commits as a courtesy to future maintainers, thank you

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as @edsantiago said please squash the commits.

@@ -80,6 +80,10 @@ func StatsContainer(w http.ResponseWriter, r *http.Request) {
ThrottlingData: docker.ThrottlingData{},
}
}
onlineCPUs, err := libpod.GetOnlineCPUs(ctnr)
if err != nil {
utils.InternalServerError(w, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a return after the InternalServerError call otherwise we just continue which will result in incorrect REST API responses.

Comment on lines +147 to +138
if err != nil {
return -1, fmt.Errorf("failed to obtain Container %s PID: %w", container.Name(), err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the err != nil branch should come before pid == 0

#

if root; then
podman pull $IMAGE &>/dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pull is not necessary, podman run will pull if required.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2023
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: boaz0, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2023
@TomSweeneyRedHat
Copy link
Member

LGTM

@mheon
Copy link
Member

mheon commented Jun 1, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2023
@mheon
Copy link
Member

mheon commented Jun 1, 2023

Thanks for the contribution @boaz0

@openshift-merge-robot openshift-merge-robot merged commit e91f6f1 into containers:main Jun 1, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 31, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

missing online CPU information w/ compat REST API