Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use working set bytes if usage bytes is zero #25428

Merged

Conversation

brianharwell
Copy link
Contributor

What does this PR do?

This will use workingSetBytes for the memory usage if usageMem is zero

Why is it important?

We have Windows containers running in Kubernetes but the kubelet only reports the working set bytes for a pod and not the memory usage bytes. As a result the field kubernetes.pod.memory.usage.limit.pct is reported as 0 even though the pod has a memory limit.

This is important because without kubernetes.pod.memory.usage.limit.pct we cannot alert or monitor based on how close the pod's memory is compared to it's memory limit.

The memory usage bytes is reported for linux pods.

Here is a sample from the kubelet. Metricbeat uses this json to report pod metrics.

{
   "podRef":{
      "name":"metricbeat-node-kkl29",
      "namespace":"logging",
      "uid":"5be7cd99-712f-47f0-84e1-eba75f00f671"
   },
   "startTime":"2021-04-21T19:56:15Z",
   "containers":[
      {
         "name":"metricbeat",
         "startTime":"2021-04-21T19:56:17Z",
         "cpu":{
            "time":"2021-04-22T18:34:31Z",
            "usageNanoCores":21051518,
            "usageCoreNanoSeconds":367953125000
         },
         "memory":{
            "time":"2021-04-22T18:34:31Z",
            "workingSetBytes":196808704
         },
         "rootfs":{
            "time":"2021-04-22T18:34:31Z",
            "availableBytes":43572236288,
            "capacityBytes":106846744576,
            "usedBytes":0
         },
         "logs":{
            "time":"2021-04-22T18:34:32Z",
            "availableBytes":43572236288,
            "capacityBytes":106846744576,
            "usedBytes":0,
            "inodesUsed":0
         }
      }
   ],
   "cpu":{
      "time":"2021-04-21T19:56:17Z",
      "usageNanoCores":21051518,
      "usageCoreNanoSeconds":367953125000
   },
   "memory":{
      "time":"2021-04-22T18:34:31Z",
      "availableBytes":0,
      "usageBytes":0,
      "workingSetBytes":196808704,
      "rssBytes":0,
      "pageFaults":0,
      "majorPageFaults":0
   },
   "network":{
      "time":"2021-04-22T18:34:32Z",
      "name":"e94f1b6bc8728316148684734375542f05a93e9a5df43fa8392f08af9bf68e1b_cbr0",
      "rxBytes":302076010,
      "txBytes":70445444,
      "interfaces":[
         {
            "name":"e94f1b6bc8728316148684734375542f05a93e9a5df43fa8392f08af9bf68e1b_cbr0",
            "rxBytes":302076010,
            "txBytes":70445444
         }
      ]
   },
   "volume":[
      {
         "time":"2021-04-21T19:56:26Z",
         "availableBytes":43691851776,
         "capacityBytes":106846744576,
         "usedBytes":5229,
         "inodesFree":0,
         "inodes":0,
         "inodesUsed":0,
         "name":"config"
      },
      {
         "time":"2021-04-21T19:56:26Z",
         "availableBytes":43691851776,
         "capacityBytes":106846744576,
         "usedBytes":6050,
         "inodesFree":0,
         "inodes":0,
         "inodesUsed":0,
         "name":"metricbeat-token-dnf9s"
      }
   ],
   "ephemeral-storage":{
      "time":"2021-04-22T18:34:32Z",
      "availableBytes":43572236288,
      "capacityBytes":106846744576,
      "usedBytes":5229,
      "inodesUsed":0
   }
}

Checklist

I have zero experience with Go. I didn't want to ask someone else to make the change because this seemed rather simple.

  • [X ] My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Logic is correct

How to test this PR locally

Test using a Linux container and a Windows container. I can assist with both of these.

Related issues

Elastic Support Ticket #00711427

Use cases

N/A

Screenshots

image

image

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 29, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 29, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: jsoriano commented: /test

  • Start Time: 2021-04-30T15:44:31.671+0000

  • Duration: 89 min 35 sec

  • Commit: f5bad30

Test stats 🧪

Test Results
Failed 0
Passed 8200
Skipped 2350
Total 10550

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 8200
Skipped 2350
Total 10550

@jsoriano jsoriano assigned jsoriano and unassigned jsoriano Apr 29, 2021
@jsoriano jsoriano added Team:Integrations Label for the Integrations team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Apr 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@jsoriano
Copy link
Member

/test

@jsoriano
Copy link
Member

Hey @brianharwell, thanks for opening this PR! Could you please also add a changelog entry in CHANGELOG.next.asciidoc?

@brianharwell
Copy link
Contributor Author

@jsoriano Done.

I can help test if you want.

@jsoriano
Copy link
Member

@brianharwell I have been thing about this and I have conflicting thoughts about what to do with these fields. I don't think that we can consider working set to be the same as the usage memory. These are not reported as the same by docker or kubernetes and Metricbeat is not reporting it as the same in this case or the docker case. We had similar discussions when adding these fields to the docker module, in #12172.

But, if a pod is killed when its working set grows over its limit, I guess that we can use this value to calculate the percentage.

Do you know if that is the case? Is a pod killed if its working set grows over its limit?

@@ -46,6 +46,7 @@ https://github.com/elastic/beats/compare/v7.12.0...v7.12.1[View commits]
*Metricbeat*

- Ignore unsupported derive types for filesystem metricset. {issue}22501[22501] {pull}24502[24502]
- Use working set bytes when memory usage is not reported. {pull}25428[25428]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be added in CHANGELOG.next.asciidoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Sorry!

@brianharwell
Copy link
Contributor Author

brianharwell commented Apr 29, 2021

@brianharwell I have been thing about this and I have conflicting thoughts about what to do with these fields. I don't think that we can consider working set to be the same as the usage memory. These are not reported as the same by docker or kubernetes and Metricbeat is not reporting it as the same in this case or the docker case. We had similar discussions when adding these fields to the docker module, in #12172.

I read through that discussion and I understand your point. This comment stuck out to me: #12172 (comment)

Kubernetes supports limits for Windows pods.

But, if a pod is killed when its working set grows over its limit, I guess that we can use this value to calculate the percentage.

Do you know if that is the case? Is a pod killed if its working set grows over its limit?

Yes, I have test cases that prove that Kubernetes will terminate and restart a pod when the pod goes over a memory threshold.

I created a console application that adds 1M guids to a list, waits a few seconds, and adds another 1M guids and continues until it consumes as much memory as possible.

It appears that the pod is terminated when the kubernetes.pod.memory.working_set.bytes gets to about 72% of the memory limit. I tried this with a 3,000Mi memory limit and a 6,000Mi memory limit.

Here are the console logs of my test app...
image

And here is the memory usage chart from Kibana...
image

And here is output from describing the pod...

Containers:
  memory:
    Container ID:  docker://79337d3f0ce1881213ee78aec5d6067f99bcc367356efc8a4df02637b26ba002
    Image:         brianharwell/memorystress:1809
    Image ID:      docker-pullable://brianharwell/memorystress@sha256:272efaef128ad3d3e1ee1ba18e7dff9f8e6958dc662a2453bbba7a2b25a570cb
    Port:          <none>
    Host Port:     <none>
    Command:
    State:          Running
      Started:      Thu, 29 Apr 2021 13:39:25 -0500
    Last State:     Terminated
      Exit Code:    -532462766
      Started:      Thu, 29 Apr 2021 13:37:10 -0500
      Finished:     Thu, 29 Apr 2021 13:39:17 -0500
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     500m
      memory:  3000Mi
    Requests:
      cpu:     500m
      memory:  3000Mi

Based on this, in Windows the working set bytes may not be synonymous with usage bytes. But the biggest issue, and the reason for this PR is I have no way to chart or alert on memory usage compared to the limit. So I am open to ideas.

@jsoriano
Copy link
Member

jsoriano commented Apr 30, 2021

Thanks for the investigation, then I think that it makes sense to calculate the percentages based on the working set.

I would propose to do the following:

  • Keep the OS-specific metrics as they are, as this is how they are reported by docker and kubernetes and they don't mean exactly the same.
  • But don't report the zero-valued ones, this way it'd be less confusing to see so many zeroed values. It'd be to conditionally set these values, something like this:
if usageMem > 0 {
  // This is a linux container.
  podEvent["memory"] = common.MapStr{
	"usage": common.MapStr{
		"bytes": usageMem,
	},
	"available": common.MapStr{
		"bytes": availMem,
	},
	"rss": common.MapStr{
		"bytes": rss,
	},
	"page_faults":       pageFaults,
	"major_page_faults": majorPageFaults,
  }
}
if workingSet > 0 {
  // This is a windows container.
  podEvent["memory"] = common.MapStr{
	"working_set": common.MapStr{
		"bytes": workingSet,
	},
  }
}
  • Calculate the percentages with the OS-specific values depending on the case. It'd be something like this:
		if usageMem > 0 {
			if nodeMem > 0 {
				podEvent.Put("memory.usage.node.pct", float64(usageMem)/nodeMem)
			}
			if memLimit > 0 {
				podEvent.Put("memory.usage.limit.pct", float64(usageMem)/memLimit)
			}
		}
		if workingSet > 0 {
			if nodeMem > 0 {
				podEvent.Put("memory.usage.node.pct", float64(workingSet)/nodeMem)
			}
			if memLimit > 0 {
				podEvent.Put("memory.usage.limit.pct", float64(workingSet)/memLimit)
			}
		}

@jsoriano jsoriano self-assigned this Apr 30, 2021
@brianharwell
Copy link
Contributor Author

  • Keep the OS-specific metrics as they are, as this is how they are reported by docker and kubernetes and they don't mean exactly the same.

I agree.

  • But don't report the zero-valued ones, this way it'd be less confusing to see so many zeroed values. It'd be to conditionally set these values, something like this:

I am not sure about this. Ideally we would be able to tell if this was a Linux pod or a Windows pod and then show fields based on that. That would make things clearer and less confusing. If (or when) a change is made on the Kubernetes side some of these values may start to appear. For example, usageMem and availMem may be reported but rss being a linux concept would not be reported so the value will show up as 0 but then workingSet would disappear. For me, I think a consistent document with notes in the documentation that some fields are not reported for Windows pods would be better.

Isn't this a breaking change? Would the removal of these fields potentially break usages in charts, queries, and watchers?

@jsoriano
Copy link
Member

I am not sure about this. Ideally we would be able to tell if this was a Linux pod or a Windows pod and then show fields based on that. That would make things clearer and less confusing. If (or when) a change is made on the Kubernetes side some of these values may start to appear. For example, usageMem and availMem may be reported but rss being a linux concept would not be reported so the value will show up as 0 but then workingSet would disappear. For me, I think a consistent document with notes in the documentation that some fields are not reported for Windows pods would be better.

Isn't this a breaking change? Would the removal of these fields potentially break usages in charts, queries, and watchers?

Well, other modules report metrics only when they are available, to differentiate this from zero values. But you are right, this could be breaking for some cases, and in any case this change wouldn't be needed to solve the lack of percentages, so we can leave this by now.

@brianharwell
Copy link
Contributor Author

I see you assigned it to yourself, does that mean you are going to make the required changes? Or do the changes I made cover this?

@jsoriano
Copy link
Member

I see you assigned it to yourself, does that mean you are going to make the required changes? Or do the changes I made cover this?

No, sorry for the confusion, we sometimes do this to indicate the team that someone is helping and reviewing community contributions 🙂 I have assigned it to you too.

I can follow with this. The pending thing I see would be to don't override usageMem in Windows, and calculate the percentage based on the workingSet, as proposed in the third bullet point in #25428 (comment)

@brianharwell
Copy link
Contributor Author

I can follow with this. The pending thing I see would be to don't override usageMem in Windows, and calculate the percentage based on the workingSet, as proposed in the third bullet point in #25428 (comment)

Yeah I like that better, I'm on it

@brianharwell
Copy link
Contributor Author

I made a slight variation...

if workingSet > 0 && usageMem == 0 {
	if nodeMem > 0 {
		podEvent.Put("memory.usage.node.pct", float64(workingSet)/nodeMem)
	}
	if memLimit > 0 {
		podEvent.Put("memory.usage.limit.pct", float64(workingSet)/memLimit)
	}
}

I added && usageMem == 0 because I think usageMem is the more appropriate field and if it is reported I think it should take precedence. What do you think?

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added && usageMem == 0 because I think usageMem is the more appropriate field and if it is reported I think it should take precedence. What do you think?

Looks good to me.

If current code solves the problem you were having I think we can go on with it.

metricbeat/module/kubernetes/pod/data.go Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Apr 30, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b windows-memory-usage-bytes-take-2 upstream/windows-memory-usage-bytes-take-2
git merge upstream/master
git push upstream windows-memory-usage-bytes-take-2

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jsoriano
Copy link
Member

/test

@jsoriano jsoriano added the backport-v7.14.0 Automated backport with mergify label Apr 30, 2021
@jsoriano jsoriano merged commit 381e062 into elastic:master Apr 30, 2021
mergify bot pushed a commit that referenced this pull request Apr 30, 2021
In Windows native containers usage memory is reported
using the workingSet. Use this value to calculate memory
usage percentage.

(cherry picked from commit 381e062)
@brianharwell
Copy link
Contributor Author

What version will this be in? 7.14?

@jsoriano jsoriano added the backport-v7.13.0 Automated backport with mergify label Apr 30, 2021
@jsoriano
Copy link
Member

What version will this be in? 7.14?

We are not accepting new features in 7.13. But I think this can be considered a bugfix, let me backport this to 7.13 too.

mergify bot pushed a commit that referenced this pull request Apr 30, 2021
In Windows native containers usage memory is reported
using the workingSet. Use this value to calculate memory
usage percentage.

(cherry picked from commit 381e062)
@brianharwell
Copy link
Contributor Author

Sweet thanks! I'd love to implement as soon as I can. 😀

jsoriano added a commit that referenced this pull request May 5, 2021
…#25470)

In Windows native containers usage memory is reported
using the workingSet. Use this value to calculate memory
usage percentage.

(cherry picked from commit 381e062)

Co-authored-by: Brian Harwell <[email protected]>
Co-authored-by: Jaime Soriano Pastor <[email protected]>
jsoriano added a commit that referenced this pull request May 5, 2021
…#25473)

In Windows native containers usage memory is reported
using the workingSet. Use this value to calculate memory
usage percentage.

(cherry picked from commit 381e062)

Co-authored-by: Brian Harwell <[email protected]>
Co-authored-by: Jaime Soriano Pastor <[email protected]>
@brianharwell
Copy link
Contributor Author

@jsoriano Any idea when the v7.13 release will be cut?

@jsoriano
Copy link
Member

@jsoriano Any idea when the v7.13 release will be cut?

@brianharwell we don't have closed dates for releases. But as an estimation, 7.13.0 will likely happen in two or three weeks.

@jsoriano
Copy link
Member

@brianharwell if you want to try with your own build, the 7.13 branch is already open.

@jsoriano
Copy link
Member

Similar issue has been reported with container metrics: #25657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.13.0 Automated backport with mergify backport-v7.14.0 Automated backport with mergify Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants