Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icmp: Both A and AAAA records are queried regardless of preferred_ip_protocol/ip_protocol_fallback #724

Closed
yegle opened this issue Dec 21, 2020 · 6 comments · Fixed by #728

Comments

@yegle
Copy link
Contributor

yegle commented Dec 21, 2020

My local DNS cache server doesn't cache results for record that doesn't exist. E.g. if a domain don't have a configured AAAA record, a DNS query to the AAAA record will not be cached. I have no idea if this is the expected behavior (The server is https://github.com/AdguardTeam/AdGuardHome FWIW).

This leads to a higher icmp probe latency for domain names that doesn't have an AAAA record.

Reading the code, it looks like the prober is always sending A/AAAA DNS queries regardless of the preferred_ip_protocol and ip_protocol_fallback by using resolver.LookupIPAddr(...).

ips, err := resolver.LookupIPAddr(ctx, target)

Would it make more sense to call resolver.LookupIP with the preferred_ip_protocol, and then make follow up DNS queries only if ip_protocol_fallback is true?

Host operating system: output of uname -a

Linux nuc 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter --version

blackbox_exporter, version 0.18.0 (branch: HEAD, revision: 60c86e6)
build user: root@53d72328d93f
build date: 20201012-09:46:31
go version: go1.15.2

What is the blackbox.yml module config.

modules:
  icmp:
    prober: icmp
    timeout: 60s
    icmp:
      preferred_ip_protocol: "ip4"

What is the prometheus.yml scrape config.

- job_name: internet-ping
  metrics_path: /probe
  params:
    module: [icmp]
  static_configs:
  - targets:
    - your-dns.run
    - google.com

What logging output did you get from adding &debug=true to the probe URL?

Logs for the probe:
ts=2020-12-21T18:01:21.831656621Z caller=main.go:304 module=icmp target=your-dns.run level=info msg="Beginning probe" probe=icmp timeout_seconds=60
ts=2020-12-21T18:01:21.831869869Z caller=icmp.go:91 module=icmp target=your-dns.run level=info msg="Resolving target address" ip_protocol=ip4
ts=2020-12-21T18:01:21.875056949Z caller=icmp.go:91 module=icmp target=your-dns.run level=info msg="Resolved target address" ip=35.247.39.128
ts=2020-12-21T18:01:21.875134792Z caller=main.go:119 module=icmp target=your-dns.run level=info msg="Creating socket"
ts=2020-12-21T18:01:21.875204367Z caller=main.go:119 module=icmp target=your-dns.run level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: permission denied"
ts=2020-12-21T18:01:21.875373799Z caller=main.go:119 module=icmp target=your-dns.run level=info msg="Creating ICMP packet" seq=52293 id=50814
ts=2020-12-21T18:01:21.875425358Z caller=main.go:119 module=icmp target=your-dns.run level=info msg="Writing out packet"
ts=2020-12-21T18:01:21.875576684Z caller=main.go:119 module=icmp target=your-dns.run level=info msg="Waiting for reply packets"
ts=2020-12-21T18:01:21.900723776Z caller=main.go:119 module=icmp target=your-dns.run level=info msg="Found matching reply packet"
ts=2020-12-21T18:01:21.900897095Z caller=main.go:304 module=icmp target=your-dns.run level=info msg="Probe succeeded" duration_seconds=0.069158382



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.043219132
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.069158382
# HELP probe_icmp_duration_seconds Duration of icmp request by phase
# TYPE probe_icmp_duration_seconds gauge
probe_icmp_duration_seconds{phase="resolve"} 0.043219132
probe_icmp_duration_seconds{phase="rtt"} 0.025207606
probe_icmp_duration_seconds{phase="setup"} 0.000290841
# HELP probe_icmp_reply_hop_limit Replied packet hop limit (TTL for ipv4)
# TYPE probe_icmp_reply_hop_limit gauge
probe_icmp_reply_hop_limit 57
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 7.89474712e+08
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1



Module configuration:
prober: icmp
timeout: 1m0s
http:
    ip_protocol_fallback: true
tcp:
    ip_protocol_fallback: true
icmp:
    preferred_ip_protocol: ip4
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

What did you do that produced an error?

N/A (no error)

What did you expect to see?

DNS resolve latency should be very low (using a local cached DNS server)

What did you see instead?

DNS resolve latency is higher than expected:

probe_icmp_duration_seconds{phase="resolve"} 0.043219132

@brian-brazil
Copy link
Contributor

I have no idea if this is the expected behavior

You can do negative caching, as long as it doesn't go over the TTL in the SOA, but there's no requirement to do so last I checked.

Would it make more sense to call resolver.LookupIP with the preferred_ip_protocol, and then make follow up DNS queries only if ip_protocol_fallback is true?

It seems like a bit of an odd situation, but we could do that and it might even make the code a tad simpler.

@hferreira23
Copy link

hferreira23 commented Dec 27, 2020

I'm noticing the same behavior. Also setting ip_fallback_protocol to false doesn't seem to stop ipv6 queries.

My config:

modules:
  icmp:
    prober: icmp
    icmp:
      preferred_ip_protocol: "ip4"
      ip_protocol_fallback: false

@brian-brazil
Copy link
Contributor

To be clear, are you saying the above is sending ipv6 ICMP queries?

@hferreira23
Copy link

DNS queries. I'm pinging a FQDN and in the name resolution process it queries both A (which resolves just fine) and AAAA (which resolves to NO DATA as I don't have a ipv6 resolver) records even though the prefered_ip_protocol and ip_protocol_fallback are setup as shown above.

I also see this behavior with the http_2xx module if it helps narrow down the issue.

@brian-brazil
Copy link
Contributor

This is what this issue is about, but it's not a bug per-se. It's only promised that the icmp/http request is over the given protocol.

@ee-github
Copy link

ee-github commented Jan 4, 2022

Apologies if replying here on a closed issue is not appropriate, but did this ever make it into the builds? I'm seeing both A and AAAA DNS query for ICMP probe targets.

blackbox_exporter_build_info{branch="master",goversion="go1.16.10",revision="70bff7941301753b125a40bcf6b3ed28935a9a94",version="0.19.0"} 1

tcpdump:

17:24:03.680670 IP 192.168.1.87.52172 > 192.168.1.1.53: 5781+ [1au] A? gw.[redacted].com. (46)
17:24:03.680784 IP 192.168.1.87.57526 > 192.168.1.1.53: 18002+ [1au] AAAA? gw.[redacted].com. (46)
17:24:03.681040 IP 192.168.1.1.53 > 192.168.1.87.52172: 5781* 1/0/1 A 192.168.1.1 (62)
17:24:03.681119 IP 192.168.1.1.53 > 192.168.1.87.57526: 18002 0/0/1 (46)

Configured in .yml as follows:

  icmp_probe:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
      ip_protocol_fallback: false

/config does not show the fallback = false under the dns: heading. Expected?

    icmp_probe:
        prober: icmp
        timeout: 5s
        http:
            ip_protocol_fallback: true
            follow_redirects: true
        tcp:
            ip_protocol_fallback: true
        icmp:
            preferred_ip_protocol: ip4
        dns:
            ip_protocol_fallback: true

Impact: In this environment, A records are answered locally by a forwarding resolver (network router), while AAAA are forwarded further. During an upstream outage there is no network route to the upstream resolver and the AAAA record is thus answered as "refused", while the A record is answered just fine. Blackbox then falsely fails the probe at the DNS step even though a valid A record was received and the target is perfectly reachable by ICMP in the un-severed part of the network.

Is my config/expectation wrong, or did I stumble on an unaddressed scenario?

[edit] Additional information. Here is the failed probe log from blackbox (looks like a timeout):

Logs for the probe:
ts=2022-01-04T22:55:03.679136956Z caller=main.go:347 module=icmp_probe target=gw.[redacted].com level=info msg="Beginning probe" probe=icmp timeout_seconds=5
ts=2022-01-04T22:55:03.679207497Z caller=icmp.go:91 module=icmp_probe target=gw.[redacted].com level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-01-04T22:55:08.679738419Z caller=icmp.go:91 module=icmp_probe target=gw.[redacted].com level=error msg="Resolution with IP protocol failed" err="i/o timeout"
ts=2022-01-04T22:55:08.679769476Z caller=main.go:140 module=icmp_probe target=gw.[redacted].com level=warn msg="Error resolving address" err="i/o timeout"
ts=2022-01-04T22:55:08.679780535Z caller=main.go:347 module=icmp_probe target=gw.[redacted].com level=error msg="Probe failed" duration_seconds=5.000612452

And here is the protocol decode snipped from Wireshark for the DNS response during the failure condition:

Domain Name System (response)
    [Request In: 116]
    [Time: 0.000255000 seconds]
    Transaction ID: 0x5fc5
    Flags: 0x8185 Standard query response, Refused
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0101 = Reply code: Refused (5)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries

Thanks in advance for any guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants