-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICMP RTT not accurate #315
Comments
probe_duration_seconds is how long the probe as a whole takes. |
I looked into this previously. I can confirm, the exporter is very bad at handling very low latency results for ICMP. It seems like a problem with Go just taking a long time to get the packet off the network. I need to do some more careful testing to see if there is something we can improve here. |
@SuperQ Measurement happens here:
The application measures execution of called function, but not socket execution time. I think a solution might be to move measurement from main.go to icmp.go and calculate time difference between write/read socket functions:
I think it could be solution for precise times and rid of execution overhead. |
Yes, I'm thinking if we adjust the interface to the prober functions slightly, that we return |
I don't think we should be changing the interface in that way, probe_duration_seconds should remain the length of the entire probe. That isn't to say the icmp module couldn't get a new metric. |
No, the probe duration needs to be the duration of the probe, not the blackbox_exporter. We shouldn't be measuring the exporter itself. |
probe_duration_seconds is in line with how every other exporter does it. |
@SuperQ I'm with you mate |
An icmp specific metric would be probe_icmp_something. reply_seconds maybe? |
I don't see how this is comparable to other exporters. Most other exporters expose raw data from some other system, pre-measured. This is creating a new measurement, which should be close to the origin of the measurement, which is the duration of the probe, not the duration of the exporter. We have |
scrape_duration_seconds is what Prometheus sees, not what the exporter sees. The exporter guidelines also recommend a metric from the exporter's perspective as differences between them help measure network latency to the exporter and spot exporter overload. Keep in mind also that this is the blackbox exporter, not the icmp exporter. What you are suggesting makes no sense for the other probes. |
Also, probe_duration_seconds is the duration of the probe. |
Exporter overload/overhead should come from Yes, it does make sense for all prober functions, as they should not include exporter overhead in their measurements either. |
Look I'm not going to argue this further. probe_duration_seconds is doing exactly what it's meant to be doing. If you want so measure a sub-part of a probe, you need a new metric. |
@brian-brazil I think this is general issue for blackbox_exporter. How is about to implement it over all modules? return true, rttTime |
HTTP already has it's own thing for this, I wouldn't worry much about making it generic. Propose something that makes sense for icmp. |
@brian-brazil I'm ok with probe_icmp_rtt_seconds
|
Sounds good, though I'm make it a bit clearer in the description exactly what duration is being measured. |
@brian-brazil do you need any help to implement this mate? |
A PR would be welcome. |
@brian-brazil any recommendations from you how to implement it align with design? |
It's just another metric, look at how it's done in other probes. |
Host operating system: output of
uname -a
blackbox_exporter version: output of
blackbox_exporter -version
What is the blackbox.yml module config.
What is the prometheus.yml scrape config.
What logging output did you get from adding
&debug=true
to the probe URL?Issue
ICMP RTT for localhost:
[root@ise-basic ~]# ping 127.0.0.1 -c 5
Meanwhile blackbox returns:
0.000377028s equals 0.377028ms, which is looks wrong for me.
Nevertheless in debug mode based on timestamp I notices nearly same time stamps:
Based on the timestamps between "Waiting for reply packets" and "Found matching reply packet", waiting takes 747779592-747746052=33540ns, which is equals 0.033ms
Also I've noticed difference in timestamps:
ts=2018-04-28T12:42:14.74765198Z caller=main.go:176 module=icmp target=127.0.0.1 level=debug msg="Writing out packet"
ts=2018-04-28T12:42:14.747746052Z caller=main.go:176 module=icmp target=127.0.0.1 level=debug msg="Waiting for reply packets"
I don't believe that it might be cause of issue based on source code, but how knows.
Any advices on this?
The text was updated successfully, but these errors were encountered: