-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] cgroup stats #1103
Comments
Couple of thoughts: If it makes sense, one option could be to require users to specify glob paths to the metrics that they want, like this:
if it always has this format of We may want to handle the files that contain many stats. There is a fileBlob,_ := ioutil.ReadFile("/cgroup/memory/memory.stat")
fields := bytes.Fields(fileBlob)
if len(fields) == 1 {
processSingleStat(fields)
} else if len(fields) > 1 {
processKeyValuePairs(fields)
} where the key-value pairs would be mapped into fields and the filename would be the measurement. The filepath could be a tag, but would need to be careful that cardinality doesn't get too high with the short-lived cgroups, I'm not sure the best way to handle those ones. |
There are more different formats:
Maybe there are other ones. |
Thoughts about config file: In this version it is difficult to manage nested cgroups.
In this version it is very verbose to manage the fields and cgroups for capturing.
For example, I want to capture only memory.usage_in_bytes, memory.limit_in_bytes for container cgroups: group1, group2, group3:
I propose such version of config file? [[inputs.cgroup]] # To don't duplicate full path, someone can define prefix. # This prefix is global for all rules. prefix = "/cgroup/" # optional # If global prefix is not defined, it is necessary to specify full path to cgroup. For example: [[inputs.cgroup.rules]] paths = [ "/cgroup/memory", # root cgroup "/cgroup/memory/child1", # container cgroup "/cgroup/memory/child2/*", # all children cgroups under child2, but not child2 itself ] fields = ["memory.max_usage_in_bytes", "memory.limit_in_bytes"] # If prefix is defined, it is necessary to specify only relative path to cgroup. For example: [[inputs.cgroup.rules]] # Also It's possible to define prefix per rule. Instead of global prefix, this one will be used for the rule. prefix = "/cgroup/cpu/" # optional paths = [ "/", # root cgroup "child1", # container cgroup "*", # all container cgroups "child2/*", # all children cgroups under child2, but not child2 itself "*/*", # all children cgroups under each container cgroup ] fields = ["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us"] |
@vlasad thanks for thinking about this. I think its worth spending time thinking about this, because a config syntax that is staggeringly verbose or makes sensible usage impossible is going to be a long term pain. I have two questions that are very open:
I've asked some of my colleagues to review your thoughts and post their comments, to get the most eyes/ideas on this! |
Answer 1:
or
or
Answer 2: These files can be parsed into following view, for example: For cpuacct.usage_percpu:
For blkio.throttle.io_serviced:
All files with single value will be combined, for example:
|
@vlasad I need to think more about the config syntax, but I suggest you start writing the parsers (which seem non-controversial to me). I will ask for feedback from my colleagues on the best syntax and get as many suggestions as possibel before we finalize anything. |
@daviesalex OK. I will start to write the parsers. |
Was reviewing Telegraf issues when I noticed this mention of cgroups... What status was this in? (preliminary/conceptual/etc) We were likely to begin working on a very similar feature plugin and are interested in seeing how far this effort has gone or whether we should continue to develop our own. |
@Will-Beninger see #1345 |
Thanks for that. We've been testing the plugin over the last week in our environment and one of the most basic functionality that we're looking for is pulling tags out of the path. I can't for the life of me find a good way to do this without dipping in to the actual functionality of the plugin itself. By default, it's returned as one big path and then we would need to do post processing in order to define what parts of the path we want to keep. For example, we return memory.stat from the following location:
It's quite deep in the directory tree but each of those tags would be useful to us. Are you running in to a similar issue? We are of course looking at memory.stat (for example) each layer above that and are looking for similar tags along the way. |
closes influxdata#1103 closes influxdata#1350
We propose working on a plugin that will capture specific statistics per cgroup. We are looking for comments from others who may have a interest; we have a very specific use of cgroups but are keen to implement this in a way that is generically useful for others (or at least could be extended in future to cover other use cases).
In our environment, we are only using cgroups to protect memory, but many others will be using them to protect CPU and other resources.
We have a root cgroup, and then a container cgroup, and then very many children cgroups. For example:
There are many (perhaps tens per host) groups like "5041915.13". These groups may exist for seconds to hours and it is for now not clear if we need to capture metrics from inside them.
In our kernel (RHEL7.1), there are 26 parameters inside memory per cgroup. This number changes hugely between kernels, and we are keen to avoid hard coding specific metric names into the plugin. To further complicate, most are a single value, but some have other formats, for example:
vs
A generic way to handle this is not clear - suggestions welcome. For now, we intend to only capture the simple values that are single numbers in a file.
To prevent capturing a truly incredible number of metrics, we intend to list the ones we want to capture in the config file:
While we would not use it, this maps nicely onto another cgroup, for example we we could do this:
This is an early set of thoughts - comments welcome from those in community at the same time as we review this internally.
The text was updated successfully, but these errors were encountered: