-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICMP probes fails continually after down and up of several target hosts, until manual restart of blackbox-exporter. #360
Comments
What does tcpdump show when this happens? |
In tcpdump i see only icmp requests and replies for other hosts, which was not trapped in this error. There is no activity for hosts with probe_success=0. I think error happen somewhere on deeper level, before actual network activity - because of this string from failed probe log: |
The logs you shared show that icmp requests are being sent, which doesn't agree with that. What does strace show? |
You right, i doesn't saw that in tcpdump by the first look. I rechecked tcpdump and grep from it one particular host by domain name and ip address:
|
Addition info, if that can help - i run blackbox-exporter under separate non-root user, with help of systemd service, and with that capability: |
I don't see any icmp responses there, it looks like there's a network issue on your end. |
But
And new probe's log:
|
That is my systemd service:
|
That'd imply some bad state in your network stack to me, likely something to do with connection tracking. |
I have a similar error, though it may be caused by something else. For me, the only way I can get the icmp probe to succeed is by trying it against a target of Any other IP address, either within the local LAN or without seems to fail. I've also added the capability with: And I've instructed systemd to run it as root.
uname -a
blackbox_exporter version: output of blackbox_exporter -version
What is the blackbox.yml module config.modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_status_codes: []
method: GET
icmp_ipv4:
prober: icmp
timeout: 15s
icmp:
preferred_ip_protocol: "ip4"
source_ip_address: "127.0.0.1" What is the prometheus.yml scrape config.global:
scrape_interval: 15s
evaluation_interval: 30s
# scrape_timeout is set to the global default (10s).
external_labels:
monitor: nuc
rule_files:
- "rules.d/*.rules"
scrape_configs:
- job_name: "prometheus"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9100"]
- job_name: "blackbox"
scrape_interval: 5s
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- google.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.
- job_name: "pingtime"
scrape_interval: 1s
metrics_path: /probe
params:
module: [icmp_ipv4]
static_configs:
- targets:
- 127.0.0.1
- 192.168.10.1
- google.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port. What logging output did you get from adding
|
That's unrelated to this issue, 127.0.0.1 won't have a route to elsewhere. |
xealot, i think you have problem because of this string in blackbox config: |
@fessmage you're right, sorry for the tracker noise and thanks for the tip |
I see similar problem with AWS environment. tcpdump shows requests, but there were no answers from target machine. I tracked down problem to zero-filled ID field in IP header of packet, generated by blackbox_exporter. I attached patch, that fixes problem, in my case. Not sure though, will this fix will work for original issue reporter. |
I don't think that patch is in line with RFC 6864, and if this is the issue then the problem is with your network rather than the blackbox exporter. |
Well, we don't have too much control to fix Amazon networking, i think. I you compare packets emitted by standard linux ping utility and blackbox_exporter, you will surely see difference in id header field. Question, whether ping utility is RFC compliant or no, remains open:) |
I'm experiencing the same issue running the exporter on Ubuntu 16.04 hosts. Is there anything I can do to help troubleshoot? |
I see nothing in RFC 6864 which says you may not set differing / random IDs on packets - only that they are no longer required to be unique. Interestingly, if I ping 8.8.8.8 from Linux, I see the Linux sender putting different values in this field, but the responses have zero ID. So at least Google thinks that zero ID is acceptable.
But pinging 9.9.9.9 I see the responses with different IDs in each packet. |
Did anyone find a workaround for this? |
None of the various different issues reported here appear to be issues in the blackbox exporter, all look like true positives due to bugs in the network or configuration errors. There's nothing the blackbox exporter can do if pings aren't making it back to the machine, so I'm going to close this. |
Update for us here in case it helps other people googling for it: This was caused by the payload for the blackbox icmp probe being 36 bytes. When we increased it to 64 bytes our probes were successful (using the payload_size parameter) |
Host operating system: output of
uname -a
Linux prometheus 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
blackbox_exporter version: output of
blackbox_exporter -version
blackbox_exporter, version 0.12.0 (branch: HEAD, revision: 4a22506)
build user: root@634195974c8e
build date: 20180227-11:50:29
go version: go1.10
What is the blackbox.yml module config.
What is the prometheus.yml scrape config.
What logging output did you get from adding
&debug=true
to the probe URL?What did you do that produced an error?
Restarting openvpn client on hypervisor host, which run virtual machine with prometheus and blackbox-exporter. Blackbox-exporter target file has around 70 entries, more than 50 behind that vpn connection.
What did you expect to see?
Some failed probes during vpn restart on hypervisor and then successfull probes again.
What did you see instead?
Probes was continually fails, for ten's of minutes, just until i manually restarted blackbox-exporter.
The text was updated successfully, but these errors were encountered: