[RFC] cgroup stats #1103

daviesalex · 2016-04-27T13:19:48Z

We propose working on a plugin that will capture specific statistics per cgroup. We are looking for comments from others who may have a interest; we have a very specific use of cgroups but are keen to implement this in a way that is generically useful for others (or at least could be extended in future to cover other use cases).

In our environment, we are only using cgroups to protect memory, but many others will be using them to protect CPU and other resources.

We have a root cgroup, and then a container cgroup, and then very many children cgroups. For example:

[root@hostname MYGROUP]# cat /cgroup/memory/memory.usage_in_bytes
25539964928
[root@hostname MYGROUP]# cat /cgroup/memory/MYGROUP/memory.usage_in_bytes
15305998336
[root@hostname MYGROUP]# cat /cgroup/memory/MYGROUP/5041915.13/memory.usage_in_bytes
1301893120

There are many (perhaps tens per host) groups like "5041915.13". These groups may exist for seconds to hours and it is for now not clear if we need to capture metrics from inside them.

In our kernel (RHEL7.1), there are 26 parameters inside memory per cgroup. This number changes hugely between kernels, and we are keen to avoid hard coding specific metric names into the plugin. To further complicate, most are a single value, but some have other formats, for example:

[root@hostname memory]# cat memory.usage_in_bytes
34596487168

vs

[root@hostname memory]# cat memory.stat
cache 3759149056
rss 2654208
rss_huge 0
mapped_file 75599872
swap 0
pgpgin 216730982
pgpgout 215812573
pgfault 1045296909
pgmajfault 471
inactive_anon 766427136
active_anon 1895514112
inactive_file 193646592
active_file 906231808
unevictable 0
hierarchical_memory_limit 9223372036854775807
hierarchical_memsw_limit 9223372036854775807
total_cache 14899679232
total_rss 12077625344
total_rss_huge 0
total_mapped_file 151244800
total_swap 0
total_pgpgin 2713681015
total_pgpgout 2707094758
total_pgfault 10890941706
total_pgmajfault 85686
total_inactive_anon 5938786304
total_active_anon 14048608256
total_inactive_file 560480256
total_active_file 2134081536
total_unevictable 4294967296

A generic way to handle this is not clear - suggestions welcome. For now, we intend to only capture the simple values that are single numbers in a file.

To prevent capturing a truly incredible number of metrics, we intend to list the ones we want to capture in the config file:

[[inputs.cgroup]]
  [inputs.cgroup.memory]
    # List of fields that we read in for each cgroup
    # Note - the memory. needs to be here, because some cgroups (e.g. cpuacct) have multiple prefixes
    fields=["memory.max_usage_in_bytes", "memory.limit_in_bytes"]

    # This would capture the root cgroup, plus the group MYGROUP. 
    # TBC - allow a wildcard to capture *all* cgroups (would require plugin to detect these via "ls" in the cgroup directories
    child_cgroups=["MYGROUP"] #

While we would not use it, this maps nicely onto another cgroup, for example we we could do this:

[[inputs.cgroup]]
  [inputs.cgroup.cpuacct]
    fields=["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us"]
    child_cgroups=["MYGROUP"]

This is an early set of thoughts - comments welcome from those in community at the same time as we review this internally.

The text was updated successfully, but these errors were encountered:

sparrc · 2016-04-27T15:42:26Z

Couple of thoughts:

If it makes sense, one option could be to require users to specify glob paths to the metrics that they want, like this:

[[inputs.cgroup]]
  paths = ["/cgroup/memory/memory.*", "/cgroup/memory/*/memory.*"]

if it always has this format of memory.something then it would seem like that should be mapped to measurement_name.field_name

We may want to handle the files that contain many stats. There is a bytes.Fields function in Go that I think would make this fairly easy (https://golang.org/pkg/bytes/#Fields). You could basically do something like this:

fileBlob,_ := ioutil.ReadFile("/cgroup/memory/memory.stat")
fields := bytes.Fields(fileBlob)
if len(fields) == 1 {
  processSingleStat(fields)
} else if len(fields) > 1 {
  processKeyValuePairs(fields)
}

where the key-value pairs would be mapped into fields and the filename would be the measurement.

The filepath could be a tag, but would need to be careful that cardinality doesn't get too high with the short-lived cgroups, I'm not sure the best way to handle those ones.

vlasad · 2016-05-11T15:58:05Z

There are more different formats:

# cat cpuacct.usage_percpu
20498078689 21227086712 16288952852

# cat blkio.throttle.io_serviced
11:0 Read 0
11:0 Write 0
11:0 Sync 0
11:0 Async 0
11:0 Total 0
8:0 Read 17720
8:0 Write 9113
8:0 Sync 7590
8:0 Async 19243
8:0 Total 26833
1:4 Read 3
1:4 Write 0
1:4 Sync 0
1:4 Async 3
1:4 Total 3
1:3 Read 3
1:3 Write 0
1:3 Sync 0
1:3 Async 3
1:3 Total 3
1:2 Read 3
1:2 Write 0
1:2 Sync 0
1:2 Async 3
1:2 Total 3
1:1 Read 3
1:1 Write 0
1:1 Sync 0
1:1 Async 3
1:1 Total 3
1:0 Read 3
1:0 Write 0
1:0 Sync 0
1:0 Async 3
1:0 Total 3
Total 26881

Maybe there are other ones.

vlasad · 2016-05-11T16:50:37Z

Thoughts about config file:

In this version it is difficult to manage nested cgroups.
Also, there is no way to exclude root cgroup from capturing if I want to capture only children for example.

[[inputs.cgroup]]
  [inputs.cgroup.cpuacct]
    fields=["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us"]
    child_cgroups=["MYGROUP"]

In this version it is very verbose to manage the fields and cgroups for capturing.

[[inputs.cgroup]]
  paths = ["/cgroup/memory/memory.*", "/cgroup/memory/*/memory.*"]

For example, I want to capture only memory.usage_in_bytes, memory.limit_in_bytes for container cgroups: group1, group2, group3:

[[inputs.cgroup]]
  paths = [
      "/cgroup/memory/group1/memory.usage_in_bytes",
      "/cgroup/memory/group2/memory.usage_in_bytes",
      "/cgroup/memory/group3/memory.usage_in_bytes",
      "/cgroup/memory/group1/memory.limit_in_bytes",
      "/cgroup/memory/group2/memory.limit_in_bytes",
      "/cgroup/memory/group3/memory.limit_in_bytes",
]

I propose such version of config file?
What do you think about it?
In my opinion, it is flexible and compact.

[[inputs.cgroup]]
  # To don't duplicate full path, someone can define prefix.
  # This prefix is global for all rules.
  prefix = "/cgroup/"    # optional
  # If global prefix is not defined, it is necessary to specify full path to cgroup. For example:
  [[inputs.cgroup.rules]]
    paths = [
      "/cgroup/memory",   # root cgroup
      "/cgroup/memory/child1",    # container cgroup
      "/cgroup/memory/child2/*",     # all children cgroups under child2, but not child2 itself
    ]
    fields = ["memory.max_usage_in_bytes", "memory.limit_in_bytes"]
  # If prefix is defined, it is necessary to specify only relative path to cgroup. For example:
  [[inputs.cgroup.rules]]
    # Also It's possible to define prefix per rule. Instead of global prefix, this one will be used for the rule.
    prefix = "/cgroup/cpu/"     # optional
    paths = [
      "/",   # root cgroup
      "child1",   # container cgroup
      "*",    # all container cgroups
      "child2/*",    # all children cgroups under child2, but not child2 itself
      "*/*",    # all children cgroups under each container cgroup
    ]
    fields = ["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us"]

daviesalex · 2016-05-11T21:21:18Z

@vlasad thanks for thinking about this. I think its worth spending time thinking about this, because a config syntax that is staggeringly verbose or makes sensible usage impossible is going to be a long term pain.

I have two questions that are very open:

How would you capture metrics from a specific a set of cgroups under your proposal? Say I wanted to capture just the root cgroup, and (using the example at the top of this ticket) MYGROUP. While i'm obviously interested in our use case, I think thats likely to be a common use case for this. Would we create a list of .rules with a different prefix for each cgroup?
How do we handle all these wierd output formats? Do we write a parser for each type in the plugin and then specify the type (or auto-detect the type)?

I've asked some of my colleagues to review your thoughts and post their comments, to get the most eyes/ideas on this!

vlasad · 2016-05-12T09:06:05Z

Answer 1:

[[inputs.cgroup]]
  [[inputs.cgroup.rules]]
    prefix = "/cgroup/memory"
    paths = ["/", "MYGROUP"]
    fields = ["memory.max_usage_in_bytes", "memory.limit_in_bytes"]

or

[[inputs.cgroup]]
  [[inputs.cgroup.rules]]
    paths = ["/cgroup/memory", "/cgroup/memory/MYGROUP"]
    fields = ["memory.max_usage_in_bytes", "memory.limit_in_bytes"]

or

[[inputs.cgroup]]
  prefix = "/cgroup/"  
  [[inputs.cgroup.rules]]
    paths = ["memory", "memory/MYGROUP"]
    fields = ["memory.max_usage_in_bytes", "memory.limit_in_bytes"]
  [[inputs.cgroup.rules]]
    paths = ["cpuacct", "cpuacct/MYGROUP"]
    fields = ["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us"]

Answer 2:
Yes, We will write a parser for each type. The type will be auto-detected.

These files can be parsed into following view, for example:

For cpuacct.usage_percpu:

{
    "fields":{
        "value_1":20498078689,
        "value_2":21227086712,
        "value_3":16288952852
    },
    "name":"cpuacct.usage_percpu",
    "tags":{
        "cgroup_path":"/cgroup/cpuacct"
    }
}

For blkio.throttle.io_serviced:

{
    "fields":{
        "Read":0,
        "Write":0,
        "Sync":0,
        "Async":0,
        "Total":0
    },
    "name":"blkio.throttle.io_serviced",
    "tags":{
        "cgroup_path":"/cgroup/blkio/MYGROUP",
        "major":11,
        "minor":0
    }
}
{
    "fields":{
        "Read":17720,
        "Write":9113,
        "Sync":7590,
        "Async":19243,
        "Total":26833
    },
    "name":"blkio.throttle.io_serviced",
    "tags":{
        "cgroup_path":"/cgroup/blkio/MYGROUP",
        "major":8,
        "minor":0
    }
}

...

{
    "fields":{
        "Read":3,
        "Write":0,
        "Sync":0,
        "Async":3,
        "Total":3
    },
    "name":"blkio.throttle.io_serviced",
    "tags":{
        "cgroup_path":"/cgroup/blkio/MYGROUP",
        "major":1,
        "minor":0
    }
}
{
    "fields":{
        "Total":26881
    },
    "name":"blkio.throttle.io_serviced",
    "tags":{
        "cgroup_path":"/cgroup/blkio/MYGROUP"
    }
}

All files with single value will be combined, for example:

{
    "fields":{
        "usage_in_bytes":2727654624,
        "swappiness":60,
        "limit_in_bytes":16265488952852
    },
    "name":"memory",
    "tags":{
        "cgroup_path":"/cgroup/memory"
    }
}

daviesalex · 2016-05-16T20:33:17Z

@vlasad I need to think more about the config syntax, but I suggest you start writing the parsers (which seem non-controversial to me).

I will ask for feedback from my colleagues on the best syntax and get as many suggestions as possibel before we finalize anything.

vlasad · 2016-05-16T20:48:32Z

@daviesalex OK. I will start to write the parsers.

Will-Beninger · 2016-06-03T20:00:51Z

@vlasad @daviesalex

Was reviewing Telegraf issues when I noticed this mention of cgroups... What status was this in? (preliminary/conceptual/etc)

We were likely to begin working on a very similar feature plugin and are interested in seeing how far this effort has gone or whether we should continue to develop our own.

daviesalex · 2016-06-07T18:34:32Z

@Will-Beninger see #1345

Will-Beninger · 2016-06-13T19:37:58Z

@daviesalex @vlasad

Thanks for that. We've been testing the plugin over the last week in our environment and one of the most basic functionality that we're looking for is pulling tags out of the path. I can't for the life of me find a good way to do this without dipping in to the actual functionality of the plugin itself.

By default, it's returned as one big path and then we would need to do post processing in order to define what parts of the path we want to keep. For example, we return memory.stat from the following location:

/cgroup/memory/$Scheduler/$Group/$JobID/$TaskID/$ChildID/memory.stat

It's quite deep in the directory tree but each of those tags would be useful to us. Are you running in to a similar issue? We are of course looking at memory.stat (for example) each layer above that and are looking for similar tags along the way.

closes influxdata#1103 closes influxdata#1350

sparrc closed this as completed in d641c42 Jun 23, 2016

chebrolus pushed a commit to chebrolus/telegraf that referenced this issue Jun 24, 2016

cgroup: change fields -> files

f0e80ea

closes influxdata#1103 closes influxdata#1350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] cgroup stats #1103

[RFC] cgroup stats #1103

daviesalex commented Apr 27, 2016 •

edited

Loading

sparrc commented Apr 27, 2016 •

edited

Loading

vlasad commented May 11, 2016

vlasad commented May 11, 2016 •

edited

Loading

daviesalex commented May 11, 2016 •

edited

Loading

vlasad commented May 12, 2016

daviesalex commented May 16, 2016

vlasad commented May 16, 2016

Will-Beninger commented Jun 3, 2016

daviesalex commented Jun 7, 2016

Will-Beninger commented Jun 13, 2016

[RFC] cgroup stats #1103

[RFC] cgroup stats #1103

Comments

daviesalex commented Apr 27, 2016 • edited Loading

sparrc commented Apr 27, 2016 • edited Loading

vlasad commented May 11, 2016

vlasad commented May 11, 2016 • edited Loading

daviesalex commented May 11, 2016 • edited Loading

vlasad commented May 12, 2016

daviesalex commented May 16, 2016

vlasad commented May 16, 2016

Will-Beninger commented Jun 3, 2016

daviesalex commented Jun 7, 2016

Will-Beninger commented Jun 13, 2016

daviesalex commented Apr 27, 2016 •

edited

Loading

sparrc commented Apr 27, 2016 •

edited

Loading

vlasad commented May 11, 2016 •

edited

Loading

daviesalex commented May 11, 2016 •

edited

Loading