Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API memory leak with signalilo #8290

Closed
yoshi314 opened this issue Sep 25, 2020 · 30 comments
Closed

API memory leak with signalilo #8290

yoshi314 opened this issue Sep 25, 2020 · 30 comments
Labels

Comments

@yoshi314
Copy link

Describe the bug

I use icinga2 on lxc, and signalilo runs in k8s.

I have installed signalilo as follows :

checkout https://github.com/appuio/charts/
cd signalilo

edit values.yaml to your liking, e.g

config:
  uuid: 9ec06d59-aa0c-4434-b5e2-1aeaf93cd925
  icinga_hostname: kubernetes-dev-cluster
  icinga_url: https://192.168.16.18:5665
  icinga_username: signalilo
  icinga_password: signalilopass
  # icinga_password_secret:
  alertmanager_port: 8888
  alertmanager_bearer_token: alertmanagertoken
  # alertmanager_bearer_token_secret:

Define a passive check in icinga2

template Service "signalilo-heartbeat" {
	vars.dummy_text = {{
		var service = get_service(macro("$host.name$"), macro("$service.name$"))
		var lastCheck = DateTime(service.last_check).to_string()

		return "Last heartbeat from Signalilo @ " + lastCheck
	}}
}

object Service "heartbeat" {
	import "signalilo-heartbeat"
	display_name = "Heartbeat from signalilo at k8s-dev"
	check_command = "dummy"
	check_interval = 300s
	max_check_attempts = 1
	enable_perfdata = false
	vars.dummy_state = 1
	host_name = "kubernetes-dev-cluster"
}

To Reproduce

  1. Configure icinga2 and signalilo
  2. Deploy signalilo to your k8s cluster and watch memory usage slowly climb in your icinga2 machine. It takes a while until OOM.

Disabling signalilo makes icinga2 work with no issues.

Expected behavior

Icinga2 works.

Screenshots

https://imgur.com/a/zKZlRUm

You can notice the OOM in the flat area of the plot.

Your Environment

  • Version used (icinga2 --version):
icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.4-1)

Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 10 (buster)
  Kernel: Linux
  Kernel version: 4.19.0-8-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 8.3.0
  Build host: runner-ltrjqz9n-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

  • Operating System and version: Debian 10
  • Enabled features (icinga2 feature list):
Enabled features: api checker command compatlog graphite ido-mysql livestatus mainlog notification statusdata syslog
@Al2Klimov
Copy link
Member

Hello @yoshi314 and thank you for reporting!

Please provide instructions how to reproduce this w/o K8s, LXC or signalilo. At best just Icinga 2, a shell and curl.

Best,
AK

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Sep 25, 2020
@yoshi314
Copy link
Author

well, i have no clue how to install signalilo outside of k8s, but you might simply deploy it in minikube with helm.

use provided vales.yaml and do ( in git checkout directory )

helm install --namespace monitoring signalilo . -f values.yaml --debug

i am not sure how to use helm with minikube, but i'll take a look.

@yoshi314
Copy link
Author

technically signalilo only sends passive check every minute or so. so any other API client will likely do the trick.

@Al2Klimov Al2Klimov removed the needs feedback We'll only proceed once we hear from you again label Sep 28, 2020
@Al2Klimov Al2Klimov assigned Al2Klimov and unassigned yoshi314 Sep 28, 2020
@Al2Klimov
Copy link
Member

Hello again @yoshi314!

Please could you try #7864?

https://git.icinga.com/packaging/deb-icinga2/-/jobs/66019 / Job artifacts / Download

Best,
AK

@Al2Klimov
Copy link
Member

Note: Caught the symptom that parallel while curl .../v1 loops increases memory usage up to about 2.5x (but not higher) and after the loops the memory doesn't get freed. #7864 seems to fix it.

@yoshi314
Copy link
Author

yoshi314 commented Oct 1, 2020

ok, i'll try this in a moment and report in a few days. (leak is pretty slow)

@yoshi314
Copy link
Author

yoshi314 commented Oct 1, 2020

i am testing now with this version. i'll keep you posted.

@yoshi314
Copy link
Author

yoshi314 commented Oct 1, 2020

https://imgur.com/a/GWwITwn

it's not looking promising so far, but i'll wait and see if it oom's. That "W" in the plot is the moment of upgrade.

edit: it looks more or less stable for now. but i'll check back in few hours.

@yoshi314
Copy link
Author

yoshi314 commented Oct 1, 2020

Unfortunately i am still hitting the memory leak.

I'll disable signalilo and keep this buiild running.

https://imgur.com/a/GwEAyFj

Test build was installed at 8:10 and it had two reloads in the meantime. The dip at ~17:00 is when icinga2 got oom'd.

@yoshi314
Copy link
Author

yoshi314 commented Oct 2, 2020

https://imgur.com/a/ow2U4Rw

This is the situation after i disabled signalilo (~20:00). As you an see, things have evened out almost instantly.

@Al2Klimov Al2Klimov removed the needs feedback We'll only proceed once we hear from you again label Oct 2, 2020
@Al2Klimov
Copy link
Member

@N-o-X What shall we do with #7864 now?

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Oct 2, 2020
@tpo
Copy link

tpo commented Oct 2, 2020

No idea if this is related, it's just a post by a random user (me) getting the impression, that the central hub icinga2 process looks like it's blowing up like Akira.

Maybe to quote from that post:

snip 8<

$ dpkg -s icinga2 | grep Version
Version: 2.12.0-1.bionic

$ smem -k
10112 nagios   /usr/lib/x86_64-linux-gnu/i     5.5G   386.5M   387.8M   390.5M

$ ps faux|grep 10112
nagios   10112  1.5 39.7 6765656 401316 ?      Sl   Sep29  69:00  \_ /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e /var/log/icinga2/error.log

We are monitoring about 40 systems and have about 450 checks going.

Last week the system started whining abour swap space running out. So I doubled the swap space. Now swap space is used up again. And the memory used by the main icinga only keeps on growing.

This is the central monitoring hub of ours where all satellites are connecting to and reporting their stati.

The massive growth of the memory consumed by the main icinga2 process only seems to have started since a week or so and does not correlate to anything I’m aware of (except restarting the host server around the same time, but that correlation doesn’t make sense to me).

So, is there maybe a memory leak at work in the main icinga2 process? Is that much memory consumption normal? Any hints or clues?

snap 8<

@yoshi314
Copy link
Author

yoshi314 commented Oct 4, 2020

I would check if you have some kind of passive check or api client that you enabled recently.

@yoshi314
Copy link
Author

yoshi314 commented Oct 4, 2020

The signalilo use case is basically to poke icinga2 once per minute with a heartbeat check. It may occasionally add new checks from prometheus' alertmanager but i have not setup that yet.

It's worth noting that i am not getting such issues with Dashing for icinga2 (i used to, maybe two years ago). Perhaps the client itself is misbehaving somehow?

@N-o-X
Copy link
Contributor

N-o-X commented Oct 13, 2020

@N-o-X What shall we do with #7864 now?

If it still fixes #7203, I'd still include it.

@N-o-X N-o-X removed the needs feedback We'll only proceed once we hear from you again label Oct 13, 2020
@N-o-X N-o-X removed their assignment Oct 13, 2020
@tpo
Copy link

tpo commented Oct 13, 2020

Hello again @yoshi314!

Please could you try #7864?

https://git.icinga.com/packaging/deb-icinga2/-/jobs/66019 / Job artifacts / Download

Best,
AK

I've tried to install those, to see if it maybe fixes my problem, however, my system is an Ubuntu bionic system, and so I am getting:

# dpkg -i icinga2*deb
(Reading database ... 50981 files and directories currently installed.)
Preparing to unpack icinga2_2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0_amd64.deb ...
Unpacking icinga2 (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) over (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) ...
Preparing to unpack icinga2-bin_2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0_amd64.deb ...
Unpacking icinga2-bin (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) over (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) ...
Preparing to unpack icinga2-common_2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0_all.deb ...
Unpacking icinga2-common (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) over (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) ...
Preparing to unpack icinga2-doc_2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0_all.deb ...
Unpacking icinga2-doc (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) over (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) ...
Preparing to unpack icinga2-ido-pgsql_2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0_amd64.deb ...
Unpacking icinga2-ido-pgsql (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) over (2.12.0+rc1.40.g19c632e44.20200928.1256+buster-0) ...
dpkg: dependency problems prevent configuration of icinga2-bin:
 icinga2-bin depends on libboost-atomic1.67.0; however:
  Package libboost-atomic1.67.0 is not installed.
 icinga2-bin depends on libboost-chrono1.67.0; however:
  Package libboost-chrono1.67.0 is not installed.
 icinga2-bin depends on libboost-context1.67.0; however:
  Package libboost-context1.67.0 is not installed.
 icinga2-bin depends on libboost-coroutine1.67.0; however:
  Package libboost-coroutine1.67.0 is not installed.
 icinga2-bin depends on libboost-date-time1.67.0; however:
  Package libboost-date-time1.67.0 is not installed.
 icinga2-bin depends on libboost-filesystem1.67.0; however:
  Package libboost-filesystem1.67.0 is not installed.
 icinga2-bin depends on libboost-program-options1.67.0; however:
  Package libboost-program-options1.67.0 is not installed.
 icinga2-bin depends on libboost-regex1.67.0 (>= 1.67.0-10); however:
  Package libboost-regex1.67.0 is not installed.
 icinga2-bin depends on libboost-system1.67.0; however:
  Package libboost-system1.67.0 is not installed.
 icinga2-bin depends on libboost-thread1.67.0; however:
  Package libboost-thread1.67.0 is not installed.
 icinga2-bin depends on libtinfo6 (>= 6); however:
  Package libtinfo6 is not installed.

You don't have by chance packages for bionic lying around somewhere? The icinga on Ubuntu bionic has these boost dependencies:

    --- libboost-coroutine1.67.0-icinga
    --- libboost-filesystem1.67.0-icinga
    --- libboost-program-options1.67.0-icinga
    --- libboost-regex1.67.0-icinga (>= 1.67.0-10)
    --- libboost-system1.67.0-icinga
    --- libboost-thread1.67.0-icinga

@Al2Klimov
Copy link
Member

my system is an Ubuntu bionic system

You need these ones: https://git.icinga.com/packaging/deb-icinga2/-/jobs/66027

@Al2Klimov
Copy link
Member

I use icinga2 on lxc

So Icinga is managed by systemd, right?

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Oct 14, 2020
@yoshi314
Copy link
Author

yes, it is. i might be able to switch to classic init, since it's debian.

@tpo
Copy link

tpo commented Oct 16, 2020

my system is an Ubuntu bionic system

You need these ones: https://git.icinga.com/packaging/deb-icinga2/-/jobs/66027

Thanks a lot @Al2Klimov - unfortunately that doesn't fix the leak/huge memory consumption.

@tpo
Copy link

tpo commented Dec 6, 2020

release/package 2.12.2-1.bionic seems to have fixed my problem.

To all of you who are improving/fixing icinga: thanks a lot!

@yoshi314
Copy link
Author

i have a suspicion that 2.12.2 might have fixed it, since my cluster survived so far. I'll be testing it a bit more over the week.

@tpo
Copy link

tpo commented Dec 16, 2020

release/package 2.12.2-1.bionic seems to have fixed my problem.

I'm afraid, that was only temporary. Icinga2 is still using huge amounts of memory:

root@host ~ # smem -k
  703 nagios   /usr/lib/x86_64-linux-gnu/i     4.6G   501.5M   502.6M   506.2M 

@Nathaniel-Donahue
Copy link

ref/NC/700554

@dgoetz
Copy link
Contributor

dgoetz commented Mar 23, 2021

A fix for this (on the side of signalilo) is available which is already merged but not released: vshn/signalilo#63
I can confirm it fixed from the environment which @Nathaniel-Donahue referenced, but it would be great if others can also confirm!

@tpo
Copy link

tpo commented Mar 23, 2021

@dgoetz are there any *deb packages or executables for me to test?

@dgoetz
Copy link
Contributor

dgoetz commented Mar 24, 2021

Our customer is using docker images. Unfortunately not sure about other installation methods as I have not worked with signalilo personally.

@tpo
Copy link

tpo commented Jun 3, 2021

OK folks, release/package/version 2.12.4-1.bionic seems to have fixed the memory leak problem for me: icinga is not using memory like crazy any more. Thanks a lot to all involved!!!* @dgoetz @Al2Klimov @N-o-X !

@yoshi314 maybe you want to recheck whether your problem ist still persisting, otherwise this ticket could be closed?

@yoshi314
Copy link
Author

yoshi314 commented Jun 3, 2021

i haven't seen the leak in a long while, but i will keep an eye out

@yoshi314
Copy link
Author

considering fixed for now. haven't seen a memleark in a while.

@Al2Klimov Al2Klimov removed the needs feedback We'll only proceed once we hear from you again label Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants