Skip to content

Commit

Permalink
libcontainer/intelrdt: add basic "MON" groups support.
Browse files Browse the repository at this point in the history
  • Loading branch information
Paweł Szulik committed Dec 1, 2022
1 parent 2da0194 commit e5ce002
Show file tree
Hide file tree
Showing 12 changed files with 338 additions and 72 deletions.
82 changes: 61 additions & 21 deletions libcontainer/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,32 +158,38 @@ init process will block waiting for the parent to finish setup.
### IntelRdt

Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
two sub-features of RDT.
Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT),
Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are
four sub-features of RDT.

Cache Allocation Technology (CAT) provides a way for the software to restrict
cache allocation to a defined 'subset' of L3 cache which may be overlapping
with other 'subsets'. The different subsets are identified by class of
service (CLOS) and each CLOS has a capacity bitmask (CBM).

Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy
for each running thread simultaneously.

Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
in MBps unit if MBA Software Controller is enabled.
indicating the percentage of maximum memory bandwidth or memory bandwidth
limit in MBps unit if MBA Software Controller is enabled.

Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth
for each running thread simultaneously.

It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.
More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3
of Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.10 kernel or newer, the interface is defined and exposed via
About Intel RDT kernel interface:
In Linux 4.14 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
"resource control" filesystem.

Intel RDT "resource control" filesystem hierarchy:
```
mount -t resctrl resctrl /sys/fs/resctrl
Expand All @@ -194,25 +200,46 @@ tree /sys/fs/resctrl
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- L3_MON
| | |-- max_threshold_occupancy
| | |-- mon_features
| | |-- num_rmids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- ...
|-- mon_groups
|-- <rmid>
|-- ...
|-- mon_data
|-- mon_L3_00
|-- llc_occupancy
|-- mbm_local_bytes
|-- mbm_total_bytes
|-- ...
|-- tasks
|-- schemata
|-- tasks
|-- <container_id>
|-- <clos>
|-- ...
|-- schemata
|-- mon_data
|-- mon_L3_00
|-- llc_occupancy
|-- mbm_local_bytes
|-- mbm_total_bytes
|-- ...
|-- tasks
|-- schemata
|-- ...
```

For runc, we can make use of `tasks` and `schemata` configuration for L3
cache and memory bandwidth resources constraints.
cache and memory bandwidth resources constraints, `mon_data` directory for
CMT and MBM statistics.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
"<clos>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent.
Expand All @@ -224,7 +251,7 @@ L3 cache schema:
It has allocation bitmasks/values for L3 cache on each socket, which
contains L3 cache id and capacity bitmask (CBM).
```
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
```
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
Expand All @@ -240,7 +267,7 @@ Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth.
```
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"

Expand All @@ -251,8 +278,10 @@ that is allocated is also dependent on the CPU model and can be looked up at
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.

If MBA Software Controller is enabled through mount option "-o mba_MBps"
If MBA Software Controller is enabled through mount option "-o mba_MBps":
```
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
```
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
instead of "percentages". The kernel underneath would use a software feedback
mechanism or a "Software Controller" which reads the actual bandwidth using
Expand All @@ -263,11 +292,12 @@ For example, on a two-socket machine, the schema line could be
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
and 7000 MBps memory bandwidth limit on socket 1.

For more information about Intel RDT kernel interface:
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

```

An example for runc:
```
Consider a two-socket machine with two L3 caches where the default CBM is
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
with a memory bandwidth granularity of 10%.
Expand All @@ -281,7 +311,17 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
"closID": "guaranteed_group",
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
}
```
Another example:
```
We only want to monitor memory bandwidth and llc occupancy.
"linux": {
"intelRdt": {
"enableMBM": true,
"enableCMT": true
}
}
```

Expand Down
2 changes: 1 addition & 1 deletion libcontainer/configs/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ type Config struct {
NoNewKeyring bool `json:"no_new_keyring"`

// IntelRdt specifies settings for Intel RDT group that the container is placed into
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available.
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`

// RootlessEUID is set when the runc was launched with non-zero EUID.
Expand Down
8 changes: 8 additions & 0 deletions libcontainer/configs/intelrdt.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,12 @@ type IntelRdt struct {
// The unit of memory bandwidth is specified in "percentages" by
// default, and in "MBps" if MBA Software Controller is enabled.
MemBwSchema string `json:"memBwSchema,omitempty"`

// The flag to indicate if Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of
// the last-level cache (LLC) occupancy for the container.
EnableCMT bool `json:"enableCMT,omitempty"`

// The flag to indicate if Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of
// total and local memory bandwidth for the container.
EnableMBM bool `json:"enableMBM,omitempty"`
}
42 changes: 42 additions & 0 deletions libcontainer/configs/intelrdt_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package configs_test

import (
"encoding/json"
"reflect"
"testing"

"github.com/opencontainers/runc/libcontainer/configs"
)

func TestUnmarshalIntelRDT(t *testing.T) {
testCases := []struct {
JSON string
Expected configs.IntelRdt
}{
{
"{\"enableMBM\": true}",
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
},
{
"{\"enableMBM\": true,\"enableCMT\": false}",
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
},
{
"{\"enableMBM\": false,\"enableCMT\": true}",
configs.IntelRdt{EnableMBM: false, EnableCMT: true},
},
}

for _, tc := range testCases {
got := configs.IntelRdt{}

err := json.Unmarshal([]byte(tc.JSON), &got)
if err != nil {
t.Fatal(err)
}

if !reflect.DeepEqual(tc.Expected, got) {
t.Errorf("expected unmarshalled IntelRDT config %+v, got %+v", tc.Expected, got)
}
}
}
10 changes: 8 additions & 2 deletions libcontainer/configs/validate/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,12 +219,18 @@ func intelrdtCheck(config *configs.Config) error {
return fmt.Errorf("invalid intelRdt.ClosID %q", config.IntelRdt.ClosID)
}

if !intelrdt.IsCATEnabled() && config.IntelRdt.L3CacheSchema != "" {
if config.IntelRdt.L3CacheSchema != "" && !intelrdt.IsCATEnabled() {
return errors.New("intelRdt.l3CacheSchema is specified in config, but Intel RDT/CAT is not enabled")
}
if !intelrdt.IsMBAEnabled() && config.IntelRdt.MemBwSchema != "" {
if config.IntelRdt.MemBwSchema != "" && !intelrdt.IsMBAEnabled() {
return errors.New("intelRdt.memBwSchema is specified in config, but Intel RDT/MBA is not enabled")
}
if config.IntelRdt.EnableCMT && !intelrdt.IsCMTEnabled() {
return errors.New("intelRdt.enableCMT is specified in config, but Intel RDT/CMT is not enabled")
}
if config.IntelRdt.EnableMBM && !intelrdt.IsMBMEnabled() {
return errors.New("intelRdt.enableMBM is specified in config, but Intel RDT/MBM is not enabled")
}
}

return nil
Expand Down
1 change: 1 addition & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -2012,6 +2012,7 @@ func (c *Container) currentState() (*State, error) {
if c.intelRdtManager != nil {
intelRdtPath = c.intelRdtManager.GetPath()
}

state := &State{
BaseState: BaseState{
ID: c.ID(),
Expand Down
Loading

0 comments on commit e5ce002

Please sign in to comment.