Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly huge byte values returned for system.memory.swap.used.bytes #6271

Closed
ppf2 opened this issue Feb 3, 2018 · 6 comments
Closed
Assignees
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team

Comments

@ppf2
Copy link
Member

ppf2 commented Feb 3, 2018

6.1

Metricbeat is returning ginormous byte values for swap used. There are docker and kube processes running on this bare metal server (metricbeat is running directly on the host machine, not within containers).

[2018-01-31T17:07:47,804][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch.
{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [system.memory.swap.used.bytes]", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"Numeric value (18446744070278606848) out of range of long (-9223372036854775808 - 9223372036854775807)\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@45bfe490; line: 1, column: 342]"}}}}}

Workaround is to filter out fields using a processor with these large values. If this is not something we can avoid because of golang, it will be nice for beats to filter these invalid values automatically without manual intervention.

CentOS Linux release 7.4.1708 (Core)

 free -m
              total        used        free      shared  buff/cache   available
Mem:         257675        9188      244749          27        3737      247391
Swap:          4083           0        4083
top - 10:38:10 up 3 days,  3:45,  4 users,  load average: 0.53, 0.70, 0.54
Tasks: 895 total,   1 running, 894 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  0.9 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26385961+total, 25061606+free,  9416044 used,  3827500 buff/cache
KiB Swap:  4182012 total,  4182012 free,        0 used. 25332158+avail Mem
@ppf2 ppf2 added bug Metricbeat Metricbeat labels Feb 3, 2018
@exekias
Copy link
Contributor

exekias commented Feb 8, 2018

Hi @ppf2, would it be possible to get the output of cat /proc/meminfo?

@ppf2
Copy link
Member Author

ppf2 commented Feb 8, 2018

cat /proc/meminfo

MemTotal: 263859608 kB

MemFree: 246927004 kB

MemAvailable: 250038484 kB

Buffers: 6256 kB

Cached: 3830644 kB

SwapCached: 0 kB

Active: 12868848 kB

Inactive: 1777640 kB

Active(anon): 10819508 kB

Inactive(anon): 42668 kB

Active(file): 2049340 kB

Inactive(file): 1734972 kB

Unevictable: 24528 kB

Mlocked: 24528 kB

SwapTotal: 4182012 kB

SwapFree: 4182012 kB

Dirty: 72 kB

Writeback: 0 kB

AnonPages: 10834488 kB

Mapped: 374148 kB

Shmem: 44252 kB

Slab: 430668 kB

SReclaimable: 197468 kB

SUnreclaim: 233200 kB

KernelStack: 59168 kB

PageTables: 46832 kB

NFS_Unstable: 0 kB

Bounce: 0 kB

WritebackTmp: 0 kB

CommitLimit: 136111816 kB

Committed_AS: 28181764 kB

VmallocTotal: 34359738367 kB

VmallocUsed: 762848 kB

VmallocChunk: 34224596664 kB

HardwareCorrupted: 0 kB

AnonHugePages: 6123520 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

DirectMap4k: 462824 kB

DirectMap2M: 21444608 kB

DirectMap1G: 248512512 kB

@ruflin ruflin added the module label Feb 26, 2018
@jsoriano
Copy link
Member

A similar issue has been reported in discuss (https://discuss.elastic.co/t/metricbeat-5-3-0-reporting-strange-memory-value/124825).
I have been taking a look and the huge values are near maximum uint64. These "used" values are calculated as Total - Free, so it seems that in some circumstances, somehow, the found Free memory is bigger than the Total, leading to an integer overflow. I think this is something to investigate in gosigar.

@alvarolobato
Copy link

Related to #8991

@jsoriano
Copy link
Member

jsoriano commented Dec 4, 2018

I have been trying to reproduce this issue without any success. For the values of swap reported it should work, as metricbeat only checks for SwapTotal and SwapFree and then calculates the used value as the difference. I have seen reports of free swap being greater than total due to bugs on very old kernels (2.6), but this shouldn't be happening on CentOS 7 (likely using 3.10).

I am opening #9383 as a workaround for this. This PR assumes that if this happens, then no swap is being used as in the values shared here.

@jsoriano
Copy link
Member

jsoriano commented Dec 4, 2018

@ppf2 is there a chance to check if this is still happening on this server? or do you know of other cases of this issue?
#9383 would workaround this, but as we didn't have more reports about this I'd be more inclined to close this issue and the PR without merging, and assume that there was some problem in this server.

(There were other similar reports, but they were on FreeBSD, and were fixed by elastic/gosigar#106)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants