Containerd metricbeat module #29247

MichaelKatsoulis · 2021-12-02T11:56:43Z

What does this PR do?

This PR creates containerd metricbeat module.
Cpu, memory and blkio metricsets are created.

Why is it important?

Containerd is a container runtime that implements Container Runtime Interface (CRI).
It is used as one of Kubernetes runtimes after k8s deprecating docker after v1.20.
Containerd if configured to expose metrics it provides useful informations about cpu, memory and blkio.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Create a kind kubernetes cluster of version higher than 1.20
kind create cluster --image 'kindest/node:v1.21.1' --config kind-cluster.yaml
docker exec in the kind docker container that is created
Edit /etc/containerd/config.toml and add

[metrics]
        address = "127.0.0.1:1338"

Restart containerd service systemctl restart containerd
Before deploying metricbeat add the following data in metricbeat-daemonset-modules ConfigMap

containerd.yml: |-
    - module: containerd
      metricsets:
        - cpu
        - memory
        - blkio
      enabled: true
      period: 10s
      hosts: ["localhost:1338"]
      calcpct: true

Run Metricbeat and watch the containerd fields getting populated.

Use cases

Screenshots

mergify · 2021-12-02T11:56:51Z

This pull request does not have a backport label. Could you fix it @MichaelKatsoulis? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

elasticmachine · 2021-12-02T12:11:24Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-01-11T11:20:58.847+0000
Duration: 107 min 55 sec
Commit: 34badbe

Test stats 🧪

Test	Results
Failed	0
Passed	9718
Skipped	2528
Total	12246

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

ChrsMark · 2021-12-03T13:35:42Z

metricbeat/module/containerd/cpu/cpu.go

+			"process_cpu_seconds_total":        prometheus.Metric("system.total"),
+		},
+		Labels: map[string]prometheus.LabelMap{
+			"container_id": p.KeyLabel("id"),


You are using openmetrics library here, is this intentional? Why not prometheus.KeyLabel()?

ChrsMark · 2021-12-03T13:46:28Z

metricbeat/module/containerd/cpu/cpu.go

+
+// init registers the MetricSet with the central registry.
+// The New method will be called after the setup of the module and before starting to fetch data
+func init() {


This is a new module so I guess it would better fit under xpack.

ChrsMark · 2021-12-03T13:47:49Z

metricbeat/module/containerd/cpu/metricset.go

+		if err == nil {
+			cpuUsageTotalPct := calcCpuTotalUsagePct(cpuUsageTotal.(float64), systemUsageDelta,
+				float64(contCpus), cID, m.preContainerCpuTotalUsage)
+			m.Logger().Infof("cpuUsageTotalPct for %+v is %+v", cID, cpuUsageTotalPct)


Maybe that's too noisy?

remember it is a draft . I gave no intention of keeping it

ChrsMark · 2021-12-03T13:49:53Z

metricbeat/module/containerd/cpu/metricset.go

+	var systemTotalNs int64
+	perContainerCpus := make(map[string]int)
+	elToDel := -1
+	for i, event := range events {


The additional percentances calculations can be configured to be enabled/disabled on demand like what is happening in docker module so as to avoid overloading the in case we have too many containers/metrics and we are interested into getting that much detail.

ChrsMark · 2021-12-03T13:59:16Z

metricbeat/module/containerd/cpu/metricset.go

+		systemUsageDelta := float64(systemTotalNs) - m.preSystemCpuUsage
+
+		// Calculate cpu total usage percentage
+		cpuUsageTotal, err := event.GetValue("usage.total.ns")


Can we put the rationale of these calculations into the PR's description and in the docs so as to have them clearly documented? I forsee getting questions about how these are calculated and we will have to dig into the code and struggle to understand these calculations.

Yes of course!

MichaelKatsoulis · 2021-12-07T10:20:48Z

Regarding the calculation of cpu usage percentage I followed this approach:

Containerd provides us with container_cpu_total_nanoseconds, container_cpu_user_nanoseconds, container_cpu_kernel_nanoseconds metrics per container id.
And also container_per_cpu_nanoseconds for each container.

container_cpu_total_nanoseconds is the sum of the container_per_cpu_nanoseconds for every cpu.

Containerd also provides process_cpu_seconds_total which is the Total user and system CPU time spent in seconds.

So in order to get the cpu usage percentage of each container I followed the approach we have in docker.

For each container:
(container_cpu_total_nanoseconds - pre_container_cpu_total_nanoseconds) / (process_cpu_seconds_total - pre_process_cpu_seconds_total)

In order to set the pre_container_cpu_total_nanoseconds and pre_process_cpu_seconds_total , every time new events are fetched , those values get updated with the latest values received.
The pre_container_cpu_total_nanoseconds is calculated per container id.

Basically, I took a point of reference, then see the difference in next batch of events. That way you can tell how much of the time was used by the container.

Also as the container_cpu_total_nanoseconds is the sum for all cpus , in order then to normalise the percentage , the percentage is divided with the number of cpus used by each container.

@fearful-symmetry as you have worked with docker module in similar calculations, what do you think about this approach?

fearful-symmetry · 2021-12-08T22:21:47Z

x-pack/metricbeat/module/containerd/cpu/cpu.go

+
+var (
+	// HostParser validates Prometheus URLs
+	hostParser = parse.URLHostParserBuilder{


Why is this declared globally? I only see it being used in the init function?

fearful-symmetry · 2021-12-08T22:24:36Z

x-pack/metricbeat/module/containerd/cpu/cpu.go

+// The New method will be called after the setup of the module and before starting to fetch data
+func init() {
+	// Mapping of state metrics
+	mapping := &prometheus.MetricsMapping{


Is there a reason why this map is declared here? I don't think that getMetricsetFactory function is getting reused, so we could just put it in one place?

fearful-symmetry · 2021-12-08T22:30:10Z

x-pack/metricbeat/module/containerd/cpu/metricset.go

+			systemUsageDelta := float64(systemTotalNs) - m.preSystemCpuUsage
+
+			// Calculate cpu total usage percentage
+			cpuUsageTotal, err := event.GetValue("usage.total.ns")


This nearly identical logic block gets duplicated 3 times, we may want to try to abstract this away to the calcCpuTotalUsagePct function so it's a bit easier to wrangle.

fearful-symmetry · 2021-12-08T22:34:14Z

x-pack/metricbeat/module/containerd/cpu/metricset.go

+	//}
+}
+
+func calcCpuTotalUsagePct(cpuUsageTotal, systemUsageDelta, contCpus float64,


I'm not clear on why these are three separate functions. They look really similar?

fearful-symmetry · 2021-12-08T22:36:47Z

x-pack/metricbeat/module/containerd/cpu/metricset.go

+			containerFields.Put("id", cID)
+			event.Delete("id")
+		}
+		e, err := util.CreateEvent(event, "containerd.cpu")


If we're gonna borrow functions from other modules, we should move that code outside the module to somewhere generic.

fearful-symmetry · 2021-12-08T22:56:34Z

As far as calculating the CPU percents, your logic seems sound? Generally, we track a previous value, calculate the delta between the previous total and the current, and divide that by the time between the deltas. Pay attention to the values that are being reported upstream, as not all platforms will have a particularly clear-cut idea of what a "total" is.

Normalized CPU usage should be thought of as the total per-CPU. Put another way, the maximum value for norm.pct should be 100%, and the maximum value for pct should usually be 100% * Number_of_cpus

For example, check out https://github.com/elastic/beats/blob/master/metricbeat/module/docker/cpu/helper.go

and https://github.com/elastic/beats/blob/master/metricbeat/internal/metrics/cpu/metrics.go

…beat_module

MichaelKatsoulis · 2021-12-23T07:21:32Z

@ChrsMark I updated the PR based on your comments. Could you take another look?

metricbeat/docs/modules/containerd.asciidoc

Co-authored-by: hendry-lim <[email protected]>

…beat_module

ChrsMark

lgtm

tetianakravchenko · 2022-01-10T10:55:07Z

metricbeat/docs/modules/containerd.asciidoc

+and more specifically fields `containerd.cpu.usage.total.pct`, `containerd.cpu.usage.kernel.pct`, `containerd.cpu.usage.user.pct`.
+Default value is true.
+
+For memory metricset if `calcpct.memory` setting is set to true, memory usage percentages will be calculated


why calcpct.cpu and calcpct.memory were introduced? do we have some reason to make it configurable?

The thought was initiated by @ChrsMark comment. In general I also agree because those extra calculations (as well as extra iteration of the events because of them) may overload the system in case of too many containers. So it is safer to have it configurable. By default it is true anyway.

tetianakravchenko · 2022-01-10T11:05:05Z

metricbeat/docs/modules/containerd.asciidoc

+Containerd module collects cpu, memory and blkio statistics about
+running containers controlled by containerd runtime.
+
+The current metricsets are: `cpu`, `blkio` and `memory`. They are not enabled by default.


but in blkio.asciidoc it is:

This is a default metricset. If the host module is unconfigured, this metricset is enabled by default.

doesn't it contradict each other?
the same for cpu and memory

I am trying to understand how this text in blkio.asciidoc is generated

Ok this comes from this line.
I believe that all these metricests should be enabled by default when a user enables containerd module. We do that in most of the modules. So I made an update in 69f4b5f

tetianakravchenko · 2022-01-11T09:57:45Z

x-pack/metricbeat/module/containerd/cpu/cpu.go

+			}
+			// Calculate cpu total usage percentage
+			cpuUsageTotal, err := event.GetValue("usage.total.ns")
+			if err == nil {


do you think it could be helpful to add some debug logs if err != nil ?

In this case, we iterate a batch of events and we check if usage.total.ns field is present. There is for sure one event in this batch that this field is not present and that is the one that has system.total field. The reason is that process_cpu_seconds_total containerd metric that is then mapped to system.total field, does not include info about any container id (system wise metric not container specific). So that leads to an event that has only system.total field. While the rest of the events have fields that are grouped together due to container_id mainly. So with this if I just want to be sure that we skip trying to calculate percentages for the event that only has system.total (it does not include any other field, is not container specific). It is not an actual error to log anything. But I will add a comment.

tetianakravchenko · 2022-01-11T10:08:03Z

x-pack/metricbeat/module/containerd/memory/memory.go

+		if m.calcPct {
+			inactiveFiles, err := event.GetValue("inactiveFiles")
+			if err != nil {
+				continue


in which cases inactiveFiles can be not present in event? should here be added a debug log to explain why usage percentage was skipped?

In no case. Only if there is an error. I will add a debug

…beat_module

… not present in memory event

MichaelKatsoulis · 2022-01-11T14:30:23Z

@fearful-symmetry you have some requested changes that are blocking the merging of the PR. I can still merge it though, unless you want to make a final review

fearful-symmetry · 2022-01-18T17:39:21Z

@MichaelKatsoulis Sorry about that! No idea how I missed your ping last week.

MichaelKatsoulis · 2022-01-19T07:55:04Z

@MichaelKatsoulis Sorry about that! No idea how I missed your ping last week.
No problem @fearful-symmetry . You can still give it a look and if there is something you think is important I can open a follow up pr to fix it.

MichaelKatsoulis added 5 commits November 24, 2021 09:19

Create cpu metricset of metricbeat containerd module initial commit

6208f0a

Add cpu fields

6550334

Generate containerd cpu metrics

928d3ea

Get cpu usage percentage

64da17e

New fields

31672b5

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2021

MichaelKatsoulis marked this pull request as draft December 2, 2021 11:56

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 2, 2021

MichaelKatsoulis added the Team:Integrations Label for the Integrations team label Dec 2, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2021

MichaelKatsoulis requested a review from ChrsMark December 2, 2021 11:57

Make fmt

512f5d4

ChrsMark reviewed Dec 3, 2021

View reviewed changes

MichaelKatsoulis added 6 commits December 6, 2021 13:21

Add config option for cpu pct calculation

287c9c4

Move containerd to xpack

e7f700c

a

3f48a7c

list common

4b90344

Update

e61077c

Fmt

5e5ee78

MichaelKatsoulis added 2 commits December 8, 2021 11:39

Add memory metricset

3ac6dca

Add more memory fields

5be637e

fearful-symmetry requested changes Dec 8, 2021

View reviewed changes

MichaelKatsoulis added 6 commits December 22, 2021 10:35

Review updates

5be9960

Update data.json files

f75539b

Update yml files

4cdb805

Merge remote-tracking branch 'upstream/master' into containerd_metric…

9e80313

…beat_module

Make update

c641c38

Update containerd documentation

3ca0046

This was referenced Dec 23, 2021

Dashboard for containerd module #29592

Closed

Integrations tests for containerd #29593

Closed

Create integration for containerd module elastic/integrations#2373

Closed

hendry-lim reviewed Dec 30, 2021

View reviewed changes

metricbeat/docs/modules/containerd.asciidoc Outdated Show resolved Hide resolved

MichaelKatsoulis and others added 4 commits January 4, 2022 12:54

Update metricbeat/docs/modules/containerd.asciidoc

5d30573

Co-authored-by: hendry-lim <[email protected]>

Merge remote-tracking branch 'upstream/master' into containerd_metric…

942c0d9

…beat_module

Make update

0dcb148

Merge remote-tracking branch 'upstream/master' into containerd_metric…

e66977f

…beat_module

ChrsMark approved these changes Jan 10, 2022

View reviewed changes

tetianakravchenko reviewed Jan 10, 2022

View reviewed changes

Metricsets enabled by default

69f4b5f

tetianakravchenko reviewed Jan 11, 2022

View reviewed changes

MichaelKatsoulis added 2 commits January 11, 2022 12:12

Merge remote-tracking branch 'upstream/master' into containerd_metric…

edd33ab

…beat_module

Add debug message in case of inactiveFiles,usage.total or usage.limit…

34badbe

… not present in memory event

tetianakravchenko approved these changes Jan 11, 2022

View reviewed changes

MichaelKatsoulis added backport-v8.1.0 Automated backport with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Jan 17, 2022

MichaelKatsoulis merged commit 181b83a into elastic:master Jan 17, 2022

MichaelKatsoulis mentioned this pull request Jan 17, 2022

Add containerd package and cpu,memory,blkio data streams elastic/integrations#2522

Merged

4 tasks

MichaelKatsoulis mentioned this pull request Mar 17, 2022

Generic containers module #17699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containerd metricbeat module #29247

Containerd metricbeat module #29247

MichaelKatsoulis commented Dec 2, 2021 •

edited

Loading

mergify bot commented Dec 2, 2021

elasticmachine commented Dec 2, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

ChrsMark Dec 3, 2021

MichaelKatsoulis Dec 6, 2021

ChrsMark Dec 3, 2021

ChrsMark Dec 3, 2021

MichaelKatsoulis Dec 3, 2021

ChrsMark Dec 3, 2021

MichaelKatsoulis Dec 6, 2021

ChrsMark Dec 3, 2021

MichaelKatsoulis Dec 3, 2021

MichaelKatsoulis commented Dec 7, 2021

fearful-symmetry Dec 8, 2021

fearful-symmetry Dec 8, 2021

fearful-symmetry Dec 8, 2021

fearful-symmetry Dec 8, 2021

fearful-symmetry Dec 8, 2021

fearful-symmetry commented Dec 8, 2021

MichaelKatsoulis commented Dec 23, 2021

ChrsMark left a comment

tetianakravchenko Jan 10, 2022

MichaelKatsoulis Jan 10, 2022

tetianakravchenko Jan 10, 2022

MichaelKatsoulis Jan 10, 2022

MichaelKatsoulis Jan 11, 2022

tetianakravchenko Jan 11, 2022

MichaelKatsoulis Jan 11, 2022

tetianakravchenko Jan 11, 2022

MichaelKatsoulis Jan 11, 2022

MichaelKatsoulis commented Jan 11, 2022

fearful-symmetry commented Jan 18, 2022

MichaelKatsoulis commented Jan 19, 2022

Containerd metricbeat module #29247

Containerd metricbeat module #29247

Conversation

MichaelKatsoulis commented Dec 2, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

How to test this PR locally

Use cases

Screenshots

mergify bot commented Dec 2, 2021

elasticmachine commented Dec 2, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelKatsoulis commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fearful-symmetry commented Dec 8, 2021

MichaelKatsoulis commented Dec 23, 2021

ChrsMark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelKatsoulis commented Jan 11, 2022

fearful-symmetry commented Jan 18, 2022

MichaelKatsoulis commented Jan 19, 2022

MichaelKatsoulis commented Dec 2, 2021 •

edited

Loading

elasticmachine commented Dec 2, 2021 •

edited by jenkins-beats-ci bot

Loading