Reducing CPU period fails for subsystems if existing parent has quota>0 with systemd driver #3084

hk-vmg · 2021-07-09T17:39:24Z

When setting a lower CFS CPU period, creating a runc container in Kubernetes using systemd driver fails with:
write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<HASH>.slice/cri-containerd-<HASH>.scope/cpu.cfs_period_us: invalid argument: unknown

cgroupfs driver works fine.

It appears that order matters, when both period and quota files are changed. I believe the root cause is that the one-at-a-time updates to the child files cause the parent limits to be exceeded. This is seen in Kubernetes - if all containers in a pod have limits, then the parent "pod" slice will set a quota and the child "container" settings can't be updated.

https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/fs/cpu.go#L74

Tested with latest master branch runc.

systemd is version 219 but I believe I have all the required OS/systemd patches:

$ rpm -q systemd --changelog | grep -i cpu
core: add CPUQuotaPeriodSec= (#1770379)
core: downgrade CPUQuotaPeriodSec= clamping logs to debug (#1770379)

@sureshvis

The text was updated successfully, but these errors were encountered:

kolyshkin · 2021-07-14T20:59:44Z

The error actually comes from the kernel.
systemd driver sets the property/limit for systemd unit (those that are possible to represent for systemd, see https://github.com/opencontainers/runc/blob/master/docs/systemd.md#resource-limits), and then uses fs driver as a backup.

The translation of OCI runtime spec properties to systemd properties for CPU quota and period happens here:

runc/libcontainer/cgroups/systemd/common.go

Lines 445 to 477 in fceadf2

    
           func addCpuQuota(cm *dbusConnManager, properties *[]systemdDbus.Property, quota int64, period uint64) { 
        
           	if period != 0 { 
        
           		// systemd only supports CPUQuotaPeriodUSec since v242 
        
           		sdVer := systemdVersion(cm) 
        
           		if sdVer >= 242 { 
        
           			*properties = append(*properties, 
        
           				newProp("CPUQuotaPeriodUSec", period)) 
        
           		} else { 
        
           			logrus.Debugf("systemd v%d is too old to support CPUQuotaPeriodSec "+ 
        
           				" (setting will still be applied to cgroupfs)", sdVer) 
        
           		} 
        
           	} 
        
           	if quota != 0 || period != 0 { 
        
           		// corresponds to USEC_INFINITY in systemd 
        
           		cpuQuotaPerSecUSec := uint64(math.MaxUint64) 
        
           		if quota > 0 { 
        
           			if period == 0 { 
        
           				// assume the default 
        
           				period = defCPUQuotaPeriod 
        
           			} 
        
           			// systemd converts CPUQuotaPerSecUSec (microseconds per CPU second) to CPUQuota 
        
           			// (integer percentage of CPU) internally.  This means that if a fractional percent of 
        
           			// CPU is indicated by Resources.CpuQuota, we need to round up to the nearest 
        
           			// 10ms (1% of a second) such that child cgroups can set the cpu.cfs_quota_us they expect. 
        
           			cpuQuotaPerSecUSec = uint64(quota*1000000) / period 
        
           			if cpuQuotaPerSecUSec%10000 != 0 { 
        
           				cpuQuotaPerSecUSec = ((cpuQuotaPerSecUSec / 10000) + 1) * 10000 
        
           			} 
        
           		} 
        
           		*properties = append(*properties, 
        
           			newProp("CPUQuotaPerSecUSec", cpuQuotaPerSecUSec)) 
        
           	} 
        
           }

Note that systemd >= 242 is required to set systemd CPUQuotaPeriod, meaning that in your case runc only sets the quota and do not set the period for systemd. This is probably why setting the period fails later.

I see a few ways to fix this:

For systemd, if the period is not default, either set the quota and period together, or do not set at all (it will be set anyway by the fs driver). This should be an easy fix.
Instead of relying on systemd version to figure out whether to set CPUQuotaPeriod, query the current value -- if parameter exists, we can set it. This will slow things down a bit.
In fs driver, if the setting is failed with EINVAL (like in your case), retry one more time after setting the other parameter.

kolyshkin · 2021-07-14T21:10:26Z

systemd is version 219

Do you mean 239?

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin · 2021-07-15T01:18:58Z

I think systemd is relevant but for the different reasons than those in items 1 and 2 in my earlier comment. It seems if systemd driver is used, systemd sets the period to 1000000 (i.e. 10x of the default period of 100000), and this may play a role in the failure.

In any case, items 1 and 2 are not entirely correct as with systemd we do not set "quota" and "period" separately, but a combined value CPUQuotaPerSec (which results in setting both quota/period), and so setting or not setting CPUQuotaPeriod does not really matter.

The item 3 gives a good description of what is happening. Fix is on the way (took me a long time to code the test case).

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

hk-vmg · 2021-07-15T11:42:47Z

Thanks so much! This is RedHat Enterprise Linux, which means systemd is a lower version but contains a lot of cherry-picked fixes backported into the lower version - I see same behavior in RHEL7 = systemd 219 and RHEL8 = systemd 239.

Maybe that is root cause: if sdVer >= 242 { from code snippet above is false, so it's falling back to the cgroup behavior. If I update the libcontainer/cgroups/fs/cpu.go#L74 logic to operate in a different order based on whether the period is going up or down, this WFM with systemd driver. It sounds like your retry fix will resolve that.

It doesn't look like there's an easy way to detect whether CPUQuotaPerSec support is available in some version-independent way, unfortunately. I suppose an "isRhel" check would work but maybe a bit hacky.

$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager Version
s "239 (239-XX.el8_X.X)"

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin · 2021-07-16T00:24:24Z

In fact it's not absolutely required to set a period via systemd, as we set if via fallback fs driver anyway, so I am dropping the idea of using more sophisticated methods of figuring out whether systemd is supporting properties that we want to set.

I have the fix almost ready, the only problem is the test case, in particular #3090 (comment). Perhaps you can help with that @hk-vmg ?

hk-vmg · 2021-07-21T22:13:10Z

Sorry for delay - thank you so much for the progress. I'll see if I can add anything, but I see you've likely got most/all of the way there.

Sometimes setting CPU quota period fails when a new period is lower, and a parent cgroup has CPU quota limit set. This happens as in cgroup v1 the quota and the period can not be set together (this is fixed in v2), and since the period is being set first, new_limit = old_quota/new_period may be higher than the parent cgroup limit. The fix is to retry setting the period after the quota, to cover all possible scenarios. Tested via runc integration tests. Before the commit, it fails: root@ubu2004:~/git/runc# RUNC=`pwd`/../crun/crun.before bats -f "pod cgroup" -t tests/integration/update.bats 1..1 not ok 1 update cpu period in a pod cgroup with pod limit set # (in test file tests/integration/update.bats, line 424) # `[ "$status" -eq 0 ]' failed # crun.before spec (status=0): # # crun.before run -d --console-socket /tmp/bats-run-30428-dYkMDC/runc.4FdCtn/tty/sock test_update (status=0): # # crun.before update --cpu-quota 600000 test_update (status=1): # writing file `cpu.cfs_quota_us`: Invalid argument # crun.before update --cpu-period 10000 --cpu-quota 3000 test_update (status=1): # writing file `cpu.cfs_period_us`: Invalid argument With the fix, the test passes. Originally reported for runc in opencontainers/runc#3084 Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin added a commit to kolyshkin/runc that referenced this issue Jul 15, 2021

tests/int: test case for opencontainers#3084

0763c54

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin added a commit to kolyshkin/runc that referenced this issue Jul 15, 2021

tests/int: test case for opencontainers#3084

4a951b1

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin added a commit to kolyshkin/runc that referenced this issue Jul 15, 2021

tests/int: test case for opencontainers#3084

d595ee9

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin mentioned this issue Jul 15, 2021

libct/cg/v1: work around CPU quota period set failure #3090

Merged

kolyshkin added a commit to kolyshkin/runc that referenced this issue Jul 15, 2021

tests/int: test case for opencontainers#3084

4d77f91

Testing that it fails without the fix. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin mentioned this issue Jul 28, 2021

[1.0] libct/cg/v1: work around CPU quota period set failure #3115

Merged

AkihiroSuda closed this as completed in #3090 Aug 30, 2021

kolyshkin mentioned this issue Apr 6, 2023

cgroup: workaround cpu quota/period issue with v1 containers/crun#1188

Merged

kolyshkin mentioned this issue Apr 25, 2024

libct/cg/fs: fix setting rt_period vs rt_runtime #4258

Merged

kolyshkin mentioned this issue May 21, 2024

[1.1] libct/cg/fs: fix setting rt_period vs rt_runtime #4284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing CPU period fails for subsystems if existing parent has quota>0 with systemd driver #3084

Reducing CPU period fails for subsystems if existing parent has quota>0 with systemd driver #3084

hk-vmg commented Jul 9, 2021 •

edited

Loading

kolyshkin commented Jul 14, 2021

kolyshkin commented Jul 14, 2021

kolyshkin commented Jul 15, 2021

hk-vmg commented Jul 15, 2021 •

edited

Loading

kolyshkin commented Jul 16, 2021

hk-vmg commented Jul 21, 2021

Reducing CPU period fails for subsystems if existing parent has quota>0 with systemd driver #3084

Reducing CPU period fails for subsystems if existing parent has quota>0 with systemd driver #3084

Comments

hk-vmg commented Jul 9, 2021 • edited Loading

kolyshkin commented Jul 14, 2021

kolyshkin commented Jul 14, 2021

kolyshkin commented Jul 15, 2021

hk-vmg commented Jul 15, 2021 • edited Loading

kolyshkin commented Jul 16, 2021

hk-vmg commented Jul 21, 2021

hk-vmg commented Jul 9, 2021 •

edited

Loading

hk-vmg commented Jul 15, 2021 •

edited

Loading