Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nessus Scanner crashes Icinga #6559

Closed
stevie-sy opened this issue Aug 20, 2018 · 39 comments
Closed

Nessus Scanner crashes Icinga #6559

stevie-sy opened this issue Aug 20, 2018 · 39 comments
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working core/crash Shouldn't happen, requires attention
Milestone

Comments

@stevie-sy
Copy link
Contributor

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help.
Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: #6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment

Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7
@Crunsher
Copy link
Contributor

It just keeps happening :rage4:

Do you happen to have a log around the time of the crash? Maybe even a log from Nessus so we can see what it's doing?

@Crunsher Crunsher added core/crash Shouldn't happen, requires attention needs feedback We'll only proceed once we hear from you again labels Aug 20, 2018
@stevie-sy
Copy link
Contributor Author

I have to ask my collegs from the security group to get it. Give us a little bit time to consolidate the logs from Icinga, Apache and Nessus.

@sjlucas
Copy link

sjlucas commented Aug 21, 2018

We also see this issue where the Nessus security scan crashes the Icinga2 service. I included the crash report and other information in #6562 (that was dupped to this issue).

@dnsmichi
Copy link
Contributor

dnsmichi commented Sep 6, 2018

What exactly does Nessus do in this specific case? Open a Tcp Socket, or doing more than a TLS handshake? Any Wireshark dumps to see the packets?

@stevie-sy
Copy link
Contributor Author

I want to do a little status report:
I talked with my colleagues from the security. If they do a Nessus Scan with 30 requests per seconds Icinga will crash afterwards. When they reduce it to 5 requests per seconds Icinga will surive.

How it happens?
For us it looks like that nessus do a connection to the icinga port 5665. Icinga will close it, but nessus says "no" with a ACK-frame. It seems that the connection will never close with a FIN-frame. So the port will be open. At the first look it seems that Icinga will surive the scan.
But when you do a reload of the icinga daemon it happens (e.g. with a automatic deploy of a new config with the director). Icinga create a new process with a new pid and want to stop the old process. But this doesn't work. So with ps axu you can see, that there are two processes with two pids and the old one do not disappear. If you do a systemctl Status icinga2 on the bash the status is reload and it won't chance.

Our Problem is that there are no log files like a crash log. At the `journalctl we don't find a entry for this.

`My colleagues try to reproduce this scenario whithout always start nessus. But at the Moment it doesn't work.

Maybe this information helps you for the moment.

@stevie-sy
Copy link
Contributor Author

@dnsmichi telepathy :-)

@stevie-sy
Copy link
Contributor Author

@dnsmichi Just thinking: Is the problem just result of this issue? #6517
I read what you wrote there. For me it looks like that it could be the same problem or something similar or a result of that.

My colleague will check this next week if there are TLS handshakes from the nessus Server in the icinga log.

@dnsmichi
Copy link
Contributor

It may be related, if the scanner doesn't close the TLS connection cleanly. That's why I want to see more logs and a tcpdump from that scanner - especially the end packets on such a connect.

@phil-or
Copy link

phil-or commented Sep 27, 2018

Sorry for the delay, but now we have more logs about the problem. (I am the colleague from stevie-sy)

In this usecase our Windows Agent "MSLI01-036" (10.1.41.224) crashes when "NESSUS" (10.1.36.101) scans him.
The Icinga parent zone is called "network" and their endpoints are "zmon-satellite3" and "zmon-satellite4".

a short timetable:
13:56 - Nessus scan starts
14:04 - "Windows Agent" is not connected to "zmon-satellite4" and all services that should deliver check results to "zmon-satellite4" are unknown. Services who deliver their check results to the "zmon-satellite3" are ok.
14:11 - Nessus scan stops
14:14 - manually stop and start Icinga on "Windows Agent" and the connection worked again

all satellites and the agent are already updated to Icinga 2.9.2

icinga-crash.zip

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 8, 2018

I forgot to click "comment" before vacation ... thanks a lot, that's exactly what I wanted to see :)

It boils down that Nessus sends some crafted TCP packets which are interpreted as netstring, but actually aren't. This is forced to Disconnect() immediately when parsing fails.

The majority of the scan uses HTTP requests though, whereas the requests are not authenticated.

[2018-09-27 14:02:50 +0200] warning/HttpServerConnection: Unauthorized request: GET /favicon.iso

[2018-09-27 14:03:02 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:50996 (no client certificate)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: Error while reading JSON-RPC message for identity '': Error: Invalid NetString (missing :)
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''
[2018-09-27 14:03:02 +0200] warning/JsonRpcConnection: API client disconnected for identity ''

[2018-09-27 14:03:04 +0200] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51016 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51016, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:12 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51018 (no client certificate)
[2018-09-27 14:03:12 +0200] information/HttpServerConnection: Request: GET /profilemanager (from [::ffff:10.1.36.101]:51018, user: <unauthenticated>)
[2018-09-27 14:03:12 +0200] warning/HttpServerConnection: Unauthorized request: GET /profilemanager
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51042 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51042, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51044 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: POST /sdk (from [::ffff:10.1.36.101]:51044, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51048 (no client certificate)
[2018-09-27 14:03:24 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51048, user: <unauthenticated>)
[2018-09-27 14:03:24 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51076 (no client certificate)
[2018-09-27 14:03:26 +0200] information/ApiListener: New client connection from [::ffff:10.1.36.101]:51082 (no client certificate)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51076, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Request: GET / (from [::ffff:10.1.36.101]:51082, user: <unauthenticated>)
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /
[2018-09-27 14:03:26 +0200] warning/HttpServerConnection: Unauthorized request: GET /

In the end, it completely fails to disconnect the remaining connections and likely just stalls everything.

[2018-09-27 14:03:26 +0200] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy

@stevie-sy
Copy link
Contributor Author

ok, thanks for the answer and the explanation. With this I understand why the load increases and after automatic deplyment with the director Icinga crashes.
We are glad that our logs are helping you. I hope you have a solution for this.

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Not yet, but at least I know where to look like inside the code :)

https://github.com/Icinga/icinga2/blob/master/lib/remote/httpserverconnection.cpp#L78

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Maybe it is related to #6514 where connections are not properly closed upon header request. I need to analyse further what exactly is sent in the raw pcap later.

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

The fix for #6517 likely improves the situation as well with a dynamic connection thread pool, instead of spawning endless threads. @stevie-sy can you test the snapshot packages by chance on such a client, with nessus scanning it?

@dnsmichi dnsmichi self-assigned this Oct 9, 2018
@dnsmichi dnsmichi added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) area/api REST API labels Oct 9, 2018
@stevie-sy
Copy link
Contributor Author

Thank you, we test it as soon as possible

@dnsmichi
Copy link
Contributor

Please do so with 2.10.1 too :)

@stevie-sy
Copy link
Contributor Author

Yes we will! :-)
At the moment we have a lot to do and some colleges are on vacation now. So we Need some more time to get a new result.
But if we have one, we will tell you immediately

@dnsmichi
Copy link
Contributor

Did you get the chance to do so already?

@dnsmichi dnsmichi removed their assignment Nov 22, 2018
@stevie-sy
Copy link
Contributor Author

Sorry, we didn't find time because of other problems we had to fix or looking for a solution. e.g. like i comment here #6514 (comment). But at the end we have the same result.

@stevie-sy
Copy link
Contributor Author

@Al2Klimov you've assigned this issue to me. What should we do?

@stevie-sy
Copy link
Contributor Author

@dnsmichi after my vacation and with our new test setup we can do this for you ;-)
Also the other issue with the log files you wrote yesterday.

But for the moment my colleague and I are little busy :-(

@dnsmichi dnsmichi added this to the 2.11.0 milestone Feb 14, 2019
@dnsmichi dnsmichi assigned Al2Klimov and unassigned stevie-sy Mar 12, 2019
@Al2Klimov
Copy link
Member

This issue seems to have been addressed by #7005.

@dnsmichi
Copy link
Contributor

dnsmichi commented Apr 8, 2019

Hi @stevie-sy,

any chance you'll deploy the current snapshot packages on a test vm, and let your nessus scanner run against it?

Cheers,
Michael

@stevie-sy
Copy link
Contributor Author

Hi @dnsmichi ! Of course and we want to help. which version from https://packages.icinga.com/epel/ should we test on our test Environment?
Stefan

@dnsmichi
Copy link
Contributor

dnsmichi commented Apr 8, 2019

Hi,

you can either use the release-rpm which allows to enable the snapshot-repo, or you'll go by the snapshot rpms located here: https://packages.icinga.com/epel/7/snapshot/x86_64/

Note: You'll need EPEL enabled, which fetches Boost 1.66+.

yum -y install https://packages.icinga.com/epel/icinga-rpm-release-7-latest.noarch.rpm
yum -y install epel-release
yum makecache

yum install --enablerepo=icinga-snapshot-build icinga2

Outputs something like this:

======================================================================================================================================================
 Package                            Arch             Version                                                   Repository                        Size
======================================================================================================================================================
Installing:
 icinga2                            x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds            29 k
Installing for dependencies:
 boost169-chrono                    x86_64           1.69.0-1.el7                                              epel                              17 k
 boost169-context                   x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-coroutine                 x86_64           1.69.0-1.el7                                              epel                              16 k
 boost169-date-time                 x86_64           1.69.0-1.el7                                              epel                              21 k
 boost169-program-options           x86_64           1.69.0-1.el7                                              epel                             125 k
 boost169-regex                     x86_64           1.69.0-1.el7                                              epel                             261 k
 boost169-system                    x86_64           1.69.0-1.el7                                              epel                             7.4 k
 boost169-thread                    x86_64           1.69.0-1.el7                                              epel                              44 k
 icinga2-bin                        x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           3.7 M
 icinga2-common                     x86_64           2.10.4.517.g6a29861-0.2019.04.06+1.el7.icinga             icinga-snapshot-builds           142 k
 libedit                            x86_64           3.0-12.20121213cvs.el7                                    base                              92 k
 libicu                             x86_64           50.1.2-17.el7                                             base                             6.9 M

Transaction Summary
======================================================================================================================================================
Install  1 Package (+12 Dependent packages)

Note: Snapshot-Builds run every night, when we've pushed git master during the day.

Cheers,
Michael

@Al2Klimov Al2Klimov removed their assignment Apr 9, 2019
@stevie-sy
Copy link
Contributor Author

Our colleagues from security have scheduled the scan for the weekend. On Monday we will know more .. The tension is increasing :-)

@stevie-sy
Copy link
Contributor Author

On the first overview from the scan:
After deployment with the director on the config-master every node surived, except the master2-node. But I have to check the logs, because this is irritaing me a little bit:
image
It Looks like that the last state is from last Friday after I updated to the last snapshot. but in the icinga2-log are a lot of entrys since that.

this is from the master1/config-master:
image

The restarts are deployments or after the update of icinga2.

BTW:
Also logstash is running with the icinga-output-plugin. I send every hour a test snmp-trap. And also here everything is fine.

So for the first look:
You did a great Job.

@stevie-sy
Copy link
Contributor Author

We did another test with todays snapshot. Everything fine during the scan. Icinga is still running. So thumbs up! Great Job! Congratulation! Bravo!

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Apr 24, 2019
@dnsmichi
Copy link
Contributor

Many thanks for the test and the kind feedback, this helps a lot and strengthens our decision to move onwards with Boost Asio, Coroutine and Beast :-)

@stevie-sy
Copy link
Contributor Author

You're welcome.
If it helps, we could also test another future version before you will release 2.11. Just let us know ;-)

@dnsmichi
Copy link
Contributor

Thanks, I'll get back to you once everything is implemented and merged :-)

@tushyjw
Copy link

tushyjw commented Jun 5, 2019

Current Behavior

We use Icinga r2.9.1-1 in HA setup. When our security department scans our IT infrastructure with the Nessus Security Scanner for Vulnerabilities the icinga nodes crashes. systemctl says as status "reload" and icingaweb2 loses connection. We configured the service daemon for automatic reload like the tip in the dokumentation. But it seems, that it didn't help.
Our old setup with version r2.8.4-1 without ha-setup survives the scan.

It look likes the now closed issue for windows: #6097

At the moment my colleagues from the security department slow down Nessus a little bit, so Icinga surived the last scan. But I don't think it's not a solution to slow down a security scanner, like fewer requests per second.

Your Environment

Director version (System - About): Git Master 71ad855
Icinga Web 2 version and modules (System - About): 2.6.1
Icinga 2 version (icinga2 --version): r.9.1-1
Operating System and version: CentOS 7
Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

@stevie-sy
Copy link
Contributor Author

@tushyjw at the end it didn't really help. My colleague found some option while creating new scans (e.g. to do not so many scans per seconds). We are still waiting for 2.11.
So for the moment you have these options:

  • (automatic) restart icinga2 after the scan
  • exclude the server from the scan, if it is ok for your boss

How did you slow Nessus down, which parameters you changed? Can you let me know because we are facing similar issues and since the new version of icinga is not released yet its creating troubles for us.

@Gleng1212
Copy link

We are seeing a similar/same problem. We are able to deal with the master by stopping before and restarting after the scan.

My question is about the clients. They are running r2.10.1-1 (the master is r2.10.5-1). I have seen the suggestion that r2.8.2-1 does not have the problem. Can I simply install 2.8.2-1 replacing 2.10.1-1?

thanks for any clues,
GlenG

@dnsmichi
Copy link
Contributor

2.8.2 has different problems. I would suggest waiting for the 2.11 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working core/crash Shouldn't happen, requires attention
Projects
None yet
Development

No branches or pull requests

8 participants