Nessus Scanner crashes Icinga #6559

stevie-sy · 2018-08-20T13:30:42Z

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help.
Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: #6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment

Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

The text was updated successfully, but these errors were encountered:

Crunsher · 2018-08-20T15:09:23Z

It just keeps happening

Do you happen to have a log around the time of the crash? Maybe even a log from Nessus so we can see what it's doing?

stevie-sy · 2018-08-20T15:35:05Z

I have to ask my collegs from the security group to get it. Give us a little bit time to consolidate the logs from Icinga, Apache and Nessus.

sjlucas · 2018-08-21T23:27:11Z

We also see this issue where the Nessus security scan crashes the Icinga2 service. I included the crash report and other information in #6562 (that was dupped to this issue).

dnsmichi · 2018-09-06T13:00:30Z

What exactly does Nessus do in this specific case? Open a Tcp Socket, or doing more than a TLS handshake? Any Wireshark dumps to see the packets?

stevie-sy · 2018-09-06T13:00:41Z

I want to do a little status report:
I talked with my colleagues from the security. If they do a Nessus Scan with 30 requests per seconds Icinga will crash afterwards. When they reduce it to 5 requests per seconds Icinga will surive.

How it happens?
For us it looks like that nessus do a connection to the icinga port 5665. Icinga will close it, but nessus says "no" with a ACK-frame. It seems that the connection will never close with a FIN-frame. So the port will be open. At the first look it seems that Icinga will surive the scan.
But when you do a reload of the icinga daemon it happens (e.g. with a automatic deploy of a new config with the director). Icinga create a new process with a new pid and want to stop the old process. But this doesn't work. So with ps axu you can see, that there are two processes with two pids and the old one do not disappear. If you do a systemctl Status icinga2 on the bash the status is reload and it won't chance.

Our Problem is that there are no log files like a crash log. At the `journalctl we don't find a entry for this.

`My colleagues try to reproduce this scenario whithout always start nessus. But at the Moment it doesn't work.

Maybe this information helps you for the moment.

stevie-sy · 2018-09-06T13:01:26Z

@dnsmichi telepathy :-)

stevie-sy · 2018-09-13T13:19:01Z

@dnsmichi Just thinking: Is the problem just result of this issue? #6517
I read what you wrote there. For me it looks like that it could be the same problem or something similar or a result of that.

My colleague will check this next week if there are TLS handshakes from the nessus Server in the icinga log.

dnsmichi · 2018-09-13T13:28:23Z

It may be related, if the scanner doesn't close the TLS connection cleanly. That's why I want to see more logs and a tcpdump from that scanner - especially the end packets on such a connect.

phil-or · 2018-09-27T14:28:58Z

Sorry for the delay, but now we have more logs about the problem. (I am the colleague from stevie-sy)

In this usecase our Windows Agent "MSLI01-036" (10.1.41.224) crashes when "NESSUS" (10.1.36.101) scans him.
The Icinga parent zone is called "network" and their endpoints are "zmon-satellite3" and "zmon-satellite4".

a short timetable:
13:56 - Nessus scan starts
14:04 - "Windows Agent" is not connected to "zmon-satellite4" and all services that should deliver check results to "zmon-satellite4" are unknown. Services who deliver their check results to the "zmon-satellite3" are ok.
14:11 - Nessus scan stops
14:14 - manually stop and start Icinga on "Windows Agent" and the connection worked again

all satellites and the agent are already updated to Icinga 2.9.2

icinga-crash.zip

dnsmichi · 2018-10-08T15:04:02Z

I forgot to click "comment" before vacation ... thanks a lot, that's exactly what I wanted to see :)

It boils down that Nessus sends some crafted TCP packets which are interpreted as netstring, but actually aren't. This is forced to Disconnect() immediately when parsing fails.

The majority of the scan uses HTTP requests though, whereas the requests are not authenticated.

[2018-09-27 14:02:50 +0200] warning/HttpServerConnection: Unauthorized request: GET /favicon.iso

[2018-09-27 14:03:02 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:50996 (no client certificate)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: Error while reading JSON-RPC message for identity '': Error: Invalid NetString (missing :)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''

[2018-09-27 14:03:04 +0200] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51016 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51016, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51018 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET /profilemanager (from [::ffff:10.1.36.101]:51018, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /profilemanager
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51042 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51042, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51044 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: POST /sdk (from [::ffff:10.1.36.101]:51044, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51048 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51048, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51076 (no client certificate)
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51082 (no client certificate)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51076, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51082, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /

In the end, it completely fails to disconnect the remaining connections and likely just stalls everything.

[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy

stevie-sy · 2018-10-09T07:00:02Z

ok, thanks for the answer and the explanation. With this I understand why the load increases and after automatic deplyment with the director Icinga crashes.
We are glad that our logs are helping you. I hope you have a solution for this.

dnsmichi · 2018-10-09T07:07:44Z

Not yet, but at least I know where to look like inside the code :)

https://github.com/Icinga/icinga2/blob/master/lib/remote/httpserverconnection.cpp#L78

dnsmichi · 2018-10-09T09:56:51Z

Maybe it is related to #6514 where connections are not properly closed upon header request. I need to analyse further what exactly is sent in the raw pcap later.

dnsmichi · 2018-10-09T09:59:36Z

The fix for #6517 likely improves the situation as well with a dynamic connection thread pool, instead of spawning endless threads. @stevie-sy can you test the snapshot packages by chance on such a client, with nessus scanning it?

stevie-sy · 2018-10-09T12:25:27Z

Thank you, we test it as soon as possible

dnsmichi · 2018-10-19T08:51:58Z

Please do so with 2.10.1 too :)

stevie-sy · 2018-10-19T10:52:24Z

Yes we will! :-)
At the moment we have a lot to do and some colleges are on vacation now. So we Need some more time to get a new result.
But if we have one, we will tell you immediately

dnsmichi · 2018-11-22T08:01:56Z

Did you get the chance to do so already?

stevie-sy · 2018-11-22T14:37:12Z

Sorry, we didn't find time because of other problems we had to fix or looking for a solution. e.g. like i comment here #6514 (comment). But at the end we have the same result.

stevie-sy · 2019-02-12T14:24:30Z

@Al2Klimov you've assigned this issue to me. What should we do?

stevie-sy · 2019-02-13T07:59:58Z

@dnsmichi after my vacation and with our new test setup we can do this for you ;-)
Also the other issue with the log files you wrote yesterday.

But for the moment my colleague and I are little busy :-(

Al2Klimov · 2019-03-12T12:27:04Z

This issue seems to have been addressed by #7005.

dnsmichi · 2019-04-08T12:26:31Z

Hi @stevie-sy,

any chance you'll deploy the current snapshot packages on a test vm, and let your nessus scanner run against it?

Cheers,
Michael

stevie-sy · 2019-04-08T13:57:37Z

Hi @dnsmichi ! Of course and we want to help. which version from https://packages.icinga.com/epel/ should we test on our test Environment?
Stefan

dnsmichi · 2019-04-08T15:37:11Z

Hi,

you can either use the release-rpm which allows to enable the snapshot-repo, or you'll go by the snapshot rpms located here: https://packages.icinga.com/epel/7/snapshot/x86_64/

Note: You'll need EPEL enabled, which fetches Boost 1.66+.

yum -y install https://packages.icinga.com/epel/icinga-rpm-release-7-latest.noarch.rpm
yum -y install epel-release
yum makecache

yum install --enablerepo=icinga-snapshot-build icinga2

Outputs something like this:

======================================================================================================================================================
 Package                            Arch             Version                                                   Repository                        Size
======================================================================================================================================================
Installing:
 icinga2                            x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds            29 k
Installing for dependencies:
 boost169-chrono                    x86_64           1.69.0-1.el7                                              epel                              17 k
 boost169-context                   x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-coroutine                 x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-date-time                 x86_64           1.69.0-1.el7                                              epel                              21 k
 boost169-program-options           x86_64           1.69.0-1.el7                                              epel                             125 k
 boost169-regex                     x86_64           1.69.0-1.el7                                              epel                             261 k
 boost169-system                    x86_64           1.69.0-1.el7                                              epel                             7.4 k
 boost169-thread                    x86_64           1.69.0-1.el7                                              epel                              44 k
 icinga2-bin                        x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           3.7 M
 icinga2-common                     x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           142 k
 libedit                            x86_64           3.0-12.20121213cvs.el7                                    base                              92 k
 libicu                             x86_64           50.1.2-17.el7                                             base                             6.9 M

Transaction Summary
======================================================================================================================================================
Install  1 Package (+12 Dependent packages)

Note: Snapshot-Builds run every night, when we've pushed git master during the day.

Cheers,
Michael

stevie-sy · 2019-04-11T13:02:31Z

Our colleagues from security have scheduled the scan for the weekend. On Monday we will know more .. The tension is increasing :-)

stevie-sy · 2019-04-15T10:00:47Z

On the first overview from the scan:
After deployment with the director on the config-master every node surived, except the master2-node. But I have to check the logs, because this is irritaing me a little bit:

It Looks like that the last state is from last Friday after I updated to the last snapshot. but in the icinga2-log are a lot of entrys since that.

this is from the master1/config-master:

The restarts are deployments or after the update of icinga2.

BTW:
Also logstash is running with the icinga-output-plugin. I send every hour a test snmp-trap. And also here everything is fine.

So for the first look:
You did a great Job.

stevie-sy · 2019-04-24T10:58:06Z

We did another test with todays snapshot. Everything fine during the scan. Icinga is still running. So thumbs up! Great Job! Congratulation! Bravo!

dnsmichi · 2019-04-24T12:16:51Z

Many thanks for the test and the kind feedback, this helps a lot and strengthens our decision to move onwards with Boost Asio, Coroutine and Beast :-)

stevie-sy · 2019-04-25T05:30:21Z

You're welcome.
If it helps, we could also test another future version before you will release 2.11. Just let us know ;-)

dnsmichi · 2019-04-25T08:28:30Z

Thanks, I'll get back to you once everything is implemented and merged :-)

tushyjw · 2019-06-05T15:53:07Z

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help.
Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: #6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment
Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

stevie-sy · 2019-06-06T05:54:25Z

@tushyjw at the end it didn't really help. My colleague found some option while creating new scans (e.g. to do not so many scans per seconds). We are still waiting for 2.11.
So for the moment you have these options:

(automatic) restart icinga2 after the scan
exclude the server from the scan, if it is ok for your boss

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

Gleng1212 · 2019-06-18T18:51:24Z

We are seeing a similar/same problem. We are able to deal with the master by stopping before and restarting after the scan.

My question is about the clients. They are running r2.10.1-1 (the master is r2.10.5-1). I have seen the suggestion that r2.8.2-1 does not have the problem. Can I simply install 2.8.2-1 replacing 2.10.1-1?

thanks for any clues,
GlenG

dnsmichi · 2019-06-19T06:14:57Z

2.8.2 has different problems. I would suggest waiting for the 2.11 release.

Crunsher added core/crash Shouldn't happen, requires attention needs feedback We'll only proceed once we hear from you again labels Aug 20, 2018

mcktr mentioned this issue Aug 21, 2018

Nessus security scan crashes Icinga2 #6562

Closed

dnsmichi self-assigned this Oct 9, 2018

dnsmichi added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) area/api REST API labels Oct 9, 2018

dnsmichi removed their assignment Nov 22, 2018

linuxmail mentioned this issue Dec 11, 2018

Icinga2 crashs while scan from Saint #6836

Closed

Al2Klimov assigned stevie-sy Feb 12, 2019

dnsmichi added this to the 2.11.0 milestone Feb 14, 2019

dnsmichi assigned Al2Klimov and unassigned stevie-sy Mar 12, 2019

Al2Klimov assigned stevie-sy Mar 12, 2019

Al2Klimov mentioned this issue Mar 12, 2019

ApiListener: use Boost ASIO and coroutines for net I/O #7005

Merged

dnsmichi mentioned this issue Mar 21, 2019

Improve network stack (Cluster, REST API) and use Boost ASIO, Beast, Coroutines and Context #7041

Closed

27 tasks

Al2Klimov removed their assignment Apr 9, 2019

dnsmichi mentioned this issue Apr 11, 2019

Icinga 2 Cluster Problem endpoints are not connected while checking still working, version 2.10.4-1 #7075

Closed

dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Apr 24, 2019

dnsmichi unassigned stevie-sy Apr 24, 2019

dnsmichi closed this as completed Apr 24, 2019

Crunsher mentioned this issue Jun 17, 2019

memory leak triggered by scanner #7244

Closed

stevie-sy mentioned this issue Aug 19, 2019

2.11 RC1: Nessus Scan crash the Windows-Client. #7431

Closed

This was referenced Dec 13, 2019

Boost Coroutines: Increase the default stack size from 64 to 256KB #7713

Merged

JSON-RPC Crashes with 2.11 #7532

Closed

Nessus Scanner crashes Icinga #6559

Nessus Scanner crashes Icinga #6559

Comments

stevie-sy commented Aug 20, 2018

Current Behavior

Your Environment

Crunsher commented Aug 20, 2018

stevie-sy commented Aug 20, 2018

sjlucas commented Aug 21, 2018

dnsmichi commented Sep 6, 2018

stevie-sy commented Sep 6, 2018

stevie-sy commented Sep 6, 2018

stevie-sy commented Sep 13, 2018

dnsmichi commented Sep 13, 2018

phil-or commented Sep 27, 2018

dnsmichi commented Oct 8, 2018

stevie-sy commented Oct 9, 2018

dnsmichi commented Oct 9, 2018

dnsmichi commented Oct 9, 2018

dnsmichi commented Oct 9, 2018

stevie-sy commented Oct 9, 2018

dnsmichi commented Oct 19, 2018

stevie-sy commented Oct 19, 2018

dnsmichi commented Nov 22, 2018

stevie-sy commented Nov 22, 2018

stevie-sy commented Feb 12, 2019

stevie-sy commented Feb 13, 2019

Al2Klimov commented Mar 12, 2019

dnsmichi commented Apr 8, 2019

stevie-sy commented Apr 8, 2019

dnsmichi commented Apr 8, 2019

stevie-sy commented Apr 11, 2019

stevie-sy commented Apr 15, 2019

stevie-sy commented Apr 24, 2019

dnsmichi commented Apr 24, 2019

stevie-sy commented Apr 25, 2019

dnsmichi commented Apr 25, 2019

tushyjw commented Jun 5, 2019

Current Behavior

Your Environment

stevie-sy commented Jun 6, 2019

Gleng1212 commented Jun 18, 2019

dnsmichi commented Jun 19, 2019