-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICMP probes fails continually after some short DNS outages, until manual restart of blackbox-exporter container #591
Comments
If the pings are going out and nothing is coming back, that's not a problem with the blackbox exporter. |
I would also suspect broken network, but I can't understand that ping tool inside the blackbox-exporter container works without problems, same with the traceroute command. Also outside the container, ping and traceroute commands work fine (for the same target). This is tcpdump output from inside the container (just blackbox-exporter running):
And this is tcpdump output, when I start ping manually inside the container, along the blackbox-exporter (blackbox-exporter id==33313):
And tcpdump output outside the container on a host machine looks same. We have added a couple of another public ip targets (like 1.1.1.1) and after a day, they have started to fail too. I would check more deeply, what's going on, but I have no idea where to look now. Please, don't You have any suggestions what else to check or how to possibly debug it? |
I'm afraid I can't really help with issues that are not with the blackbox exporter itself. |
Well, if Blackbox exporter is the only tool that sends packets and doesn't receive a reply, I guess it actually might be a problem with Blackbox itself. Maybe the payload is bad? Or maybe the other side reply's payload is but Blackbox is the only tool that can't handle it. |
If tcpdump isn't showing a packet coming back, it can't be a problem with the blackbox exporter. |
Are you sure? Blackbox is forging the ICMP packets and should they be invalid in some way, I wouldn't be surprised not to get a reply... |
You have not demonstrated that however. As-is this appears to be a network issue on your end. It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided. |
Agreed, mailing list might be a better place for further discussion, thanks for hint. |
We are also being hit by this issue in EC2 |
We confirm the same behavior: once blackbox ping encounters trouble, it keeps returning fails, even when pings run from the command line return fine. Blackbox does not recover until it is restarted. |
I'm still seeing nothing here to indicate a blackbox exporter issue, this looks like an EC2 networking issue. |
Why would blackbox fail when ping works if it was a networking issue? |
If the echo request is being sent but echo replies aren't making it back to the machine, that's not a blackbox issue. |
Not trying to be obtuse here, it just comes naturally to me. |
Per the above tcpdump, there are no replies coming back - so the blackbox exporter doesn't even have the opportunity to receive them. |
Good answer! Thank you. |
Update for people who hit the same issue: for us this was caused was a firewall which started blocking ICMP packages with the same ID. This explains why restarting blackbox temporarily fixed the issue. |
Thanks for the update. I presume that it's similar with other users, so I'm going to close this. |
@jeremybz We are having the same issues. If you can share I'd be interested in hearing: What was your firewall? What policy/change did you make to resolve the issue? thanks |
For us this was caused by our routers ICMP TTL for some reason packets from the Blackbox exporter were not creating a new session on the router (FortiGate in this case) (most likely due to the same ICMP ID) and were being routed incorrectly. This was resolved in our case by increasing the scrape interval to 90s (above our 60s TTL) - It seems to have resolved the issue for now. |
Host operating system: output of
uname -a
blackbox_exporter version: output of
blackbox_exporter -version
What is the blackbox.yml module config.
What is the prometheus.yml scrape config.
What logging output did you get from adding
&debug=true
to the probe URL?What did you do that produced an error?
We run blackbox-exporter inside docker container. Suddenly, without any changes on working machine or container, ping probe starts failing for one or more targets which we are monitoring, while other targets remain ok. When i run manually ping tool inside docker container and on hosting OS outside the docker container, both succeed.
So far we experienced this behavior for two of ours internal IP targets simultaneously (both from the same datacenter) and later just for 8.8.8.8 target.
I examined the problem with a tcpdump and it shows only request packets (no reply packets):
I also checked if there is any zero-filled ID field in IP header, as it was discussed in a very similar issue here: #360, but it is not our case.
The only correlations which we found in Grafana, are very short outages of connection from the blackbox-exporter machine to some of ours internal DNS servers (spikes are in the same time as the probes starts failing) monitored with the same blackbox-exporter ...
What did you expect to see?
Maybe some failed probes during a potential outage, but then successfull probes again.
What did you see instead?
Probes continually fails, for hours, just until i manually restart docker image.
The text was updated successfully, but these errors were encountered: