Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dns): Expand SRV resolver to account for IPv6 #2864

Merged
merged 2 commits into from
Apr 8, 2024
Merged

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Apr 4, 2024

Given the limitations of the dns resolver lib, to retrieve a service IP we currently take the SRV response target, which has the form <host|IP>.svc.ns.svc.cluster-domain., and extract the first segment replacing dashes with dots. This fails for IPv6 address, so in this change we attempt parsing for IPv4 and if that fails fallback to IPv6 by instead replacing dots with colons.

The trust-dns-resolver has been renamed to hickory-resolver upstream, so we follow suit and also upgrade the version. (dealt with separately in #2872).

Given the limitations of the dns resolver lib, to retrieve a service IP
we currently take the SRV response target, which has the form
`<host|IP>.svc.ns.svc.cluster-domain.`, and extract the first segment
replacing dashes with dots. This fails for IPv6 address, so in this
change we attempt parsing for IPv4 and if that fails fallback to IPv6 by
instead replacing dots with colons.

The `trust-dns-resolver` has been renamed to `hickory-resolver`
upstream, so we follow suit and also upgrade the version.
@alpeb alpeb requested a review from a team as a code owner April 4, 2024 13:03
@olix0r
Copy link
Member

olix0r commented Apr 4, 2024

I'm pretty uncomfortable with parsing IPs out of DNS names. I know it's the existing approach that we hacked together years ago. Do we have any idea how hard it is to use DNS to discover the IPs? cc @zaharidichev

Also, can you split out the hickory update into a distinct PR (e.g. so that this could be reverted independently)?

@zaharidichev
Copy link
Member

Do we have any idea how hard it is to use DNS to discover the IPs? cc @zaharidichev

@olix0r @alpeb looking at it

@zaharidichev
Copy link
Member

So after spending some time looking at that, here are my thoughts:

  • it appears that (according to the RFC), that it is fine for an SRV record to not contain the ip address but instead end up with a hostname as in this situation. Parsing ip addresses out of this hostname is quite brittle. We can go ahead and do a further recursive lookup on these pod DNS labels to get the ip addresses. This is to me the most principled way to do it if we want to rely on SRV records.
  • we can contribute upstream. I am not sure there is much motivation for accepting this contribution as this is technically not a bug (subjective opinion)
  • another option is to stop relying on SRV records altogether. This resolver is only used for control plane components. Seems each CP component now has a headless service corresponding to it. Using A/AAA on a headless service should just give us the ip address, correct? here is a small test.
use hickory_resolver::Resolver;

fn main() {
    let hosts = vec![
        "linkerd-identity-headless.linkerd.svc.cluster.local",
        "linkerd-dst-headless.linkerd.svc.cluster.local",
        "linkerd-policy.linkerd.svc.cluster.local",
    ];
    let resolver = Resolver::from_system_conf().unwrap();

    for host in hosts {
        println!("looking up {}", host);
        let results = resolver.lookup_ip(host).unwrap();
        dbg!(results);
    }
}
zaharidichev@zahari-dev-2:~$ kubectl get pods -n linkerd -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP            NODE                 NOMINATED NODE   READINESS GATES
linkerd-destination-59756c78b-jpj2g       4/4     Running   0          45h   10.244.0.7    kind-control-plane   <none>           <none>
linkerd-destination-59756c78b-zrrmz       4/4     Running   0          20m   10.244.0.10   kind-control-plane   <none>           <none>
linkerd-identity-b8cf46c8d-687n8          2/2     Running   0          45h   10.244.0.5    kind-control-plane   <none>           <none>
linkerd-identity-b8cf46c8d-v9lfh          2/2     Running   0          20m   10.244.0.11   kind-control-plane   <none>           <none>
linkerd-proxy-injector-854f76f9bd-dp46g   2/2     Running   0          45h   10.244.0.6    kind-control-plane   <none>           <none>
zaharidichev@zahari-dev-2:~$
root@dns-test:/# ./dnstest
looking up linkerd-identity-headless.linkerd.svc.cluster.local
[dnstest/src/main.rs:14:9] results = LookupIp(
    Lookup {
        query: Query {
            name: Name("linkerd-identity-headless.linkerd.svc.cluster.local"),
            query_type: A,
            query_class: IN,
        },
        records: [
            Record {
                name_labels: Name("linkerd-identity-headless.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.11,
                        ),
                    ),
                ),
            },
            Record {
                name_labels: Name("linkerd-identity-headless.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.5,
                        ),
                    ),
                ),
            },
        ],
        valid_until: Instant {
            tv_sec: 244851,
            tv_nsec: 59158692,
        },
    },
)
looking up linkerd-dst-headless.linkerd.svc.cluster.local
[dnstest/src/main.rs:14:9] results = LookupIp(
    Lookup {
        query: Query {
            name: Name("linkerd-dst-headless.linkerd.svc.cluster.local"),
            query_type: A,
            query_class: IN,
        },
        records: [
            Record {
                name_labels: Name("linkerd-dst-headless.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.10,
                        ),
                    ),
                ),
            },
            Record {
                name_labels: Name("linkerd-dst-headless.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.7,
                        ),
                    ),
                ),
            },
        ],
        valid_until: Instant {
            tv_sec: 244851,
            tv_nsec: 78318836,
        },
    },
)
looking up linkerd-policy.linkerd.svc.cluster.local
[dnstest/src/main.rs:14:9] results = LookupIp(
    Lookup {
        query: Query {
            name: Name("linkerd-policy.linkerd.svc.cluster.local"),
            query_type: A,
            query_class: IN,
        },
        records: [
            Record {
                name_labels: Name("linkerd-policy.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.10,
                        ),
                    ),
                ),
            },
            Record {
                name_labels: Name("linkerd-policy.linkerd.svc.cluster.local."),
                rr_type: A,
                dns_class: IN,
                ttl: 30,
                rdata: Some(
                    A(
                        A(
                            10.244.0.7,
                        ),
                    ),
                ),
            },
        ],
        valid_until: Instant {
            tv_sec: 244851,
            tv_nsec: 104219797,
        },
    },
)

So my question is, cant we just get rid of the SRV lookup logic? I mean, sure we get a port out of these. But do we really need it?

@alpeb
Copy link
Member Author

alpeb commented Apr 5, 2024

Thanks for the detailed respose @zaharidichev

we can contribute upstream. I am not sure there is much motivation for accepting this contribution as this is technically not a bug (subjective opinion)

My opinion is that it's indeed a bug, as I tested using the ip_iter method on the SRV lookup result and it just returns the first IP, contrary to their docs claim.

So my question is, cant we just get rid of the SRV lookup logic? I mean, sure we get a port out of these. But do we really need it?

I also wonder that, so looking forward to @olix0r's comments. But to clarify my understanding, digging into #594, it seems the reasoning for using SRV was that it gives us the actual pod port, even though for all the control plane services, their port is the same as the pod's, right?

@olix0r
Copy link
Member

olix0r commented Apr 5, 2024

So my question is, cant we just get rid of the SRV lookup logic? I mean, sure we get a port out of these. But do we really need it?

Can we find any docs on Headless services that describe the behavior we can rely on across k8s DNS providers?

@zaharidichev
Copy link
Member

zaharidichev commented Apr 5, 2024

Can we find any docs on Headless services that describe the behavior we can rely on across k8s DNS providers?

So there is this: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#srv-records, which elaborates on SRV records for Headless services. Not sure whether this is something that we can rely on. And it is in line with what we are seeing, right ?

More specifically:

For a headless Service, this resolves to multiple answers, one for each Pod that is backing the Service, and contains the port number and the domain name of the Pod of the form hostname.my-svc.my-namespace.svc.cluster-domain.example

@alpeb
Copy link
Member Author

alpeb commented Apr 8, 2024

In my last commit I've split out the library change, and created #2872 instead.

@olix0r
Copy link
Member

olix0r commented Apr 8, 2024

To summarize the conversation we had last week:

  • We are going to leave the hacky string parsing approach for the short-term because the library has not yet been updated to recurse. That should still be done but separately.
  • Switching to A/AAAA records is not really any better than the current approach because it's still not reliable if port and targetPort differ.

@olix0r olix0r changed the title Expand SRV resolver to account for IPv6 feat(dns): Expand SRV resolver to account for IPv6 Apr 8, 2024
@olix0r olix0r merged commit 4daa1f4 into main Apr 8, 2024
17 checks passed
@olix0r olix0r deleted the alpeb/fix-srv-lookup branch April 8, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants