Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't attempt to create sockets for unavailable IP address family #8659

Open
sanmai-NL opened this issue Apr 25, 2024 · 5 comments
Open

Don't attempt to create sockets for unavailable IP address family #8659

sanmai-NL opened this issue Apr 25, 2024 · 5 comments

Comments

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Apr 25, 2024

Feature Request

Description

On a single-stack IPv6 Talos Linux deployment, I noticed that the A DNS Resource Record (IPv4) for discovery.talos.dev is resolved, and an attempt is made to connect to the resulting IPv4-address. However, on this node no network interface has an IPv4 address assigned (with the possible exception of the loopback interface). This despite the fact that discovery.talos.dev also has an AAAA Resource Record (IPv6).

[talos] 2024/04/25 20:21:00 hello failed {"component": "controller-runtime", "controller": "cluster.DiscoveryServiceController", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.174.35.21:443: connect: network is unreachable\"", "endpoint": "discovery.talos.dev:443"}

This issue is an epic to track all instances where Talos Linux components create sockets for unavailable IP address families.

Value

The value of this feature is reduction of logging noise, better performance efficiency and better energy efficiency.

@smira
Copy link
Member

smira commented Apr 26, 2024

The standard behavior of all dialers is to try both IPv4 and IPv6 concurrently, with some delay. It will use whatever responds first. If you see an error, both failed, but only one error is reported. This is the way standard library works, and has nothing to do with Talos specifically, i.e. it will be same in any other Go-based component like containerd or Kubernetes.

Userspace programs can't guess which network is available, and having or not having addresses assigned might be not enough to do a valid guess.

Talos Linux retries on all failures, so there's no functional issue here, nor there is no real delay to get to the working state. If IPv6 worked at that moment, there would be no error at all.

@smira smira closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2024
@sanmai-NL
Copy link
Contributor Author

sanmai-NL commented May 17, 2024

@smira Can you please reopen this?

See golang/go#25321 (comment). It's been reported as an issue by users multiple other Go-based products and an easy fix is possible.

@smira
Copy link
Member

smira commented May 17, 2024

I can re-open this issue, but I don't quite see what can be done unless there's a specific bug here (which we'd be happy to look into).

Any static pre-check on availability of something doesn't make sense in Talos, as it reconfigures networking on the fly, and whatever seemed to be IPv4 environment might become IPv6 and vice versa. Doing checks on every dial operation is more expensive than just trying to dial.

Talos does small amount of network operations in general (compared to other components running on the machine).

@smira smira reopened this May 17, 2024
@sanmai-NL
Copy link
Contributor Author

sanmai-NL commented May 18, 2024

@smira the kernel parameters aren't reconfigured on-the-fly, are they? These can be set to enforce single-stack IPv4, for example. I don't expect full support of all dynamic conditions, nor do I restrict the design to a single check per lifecycle (init stage). Another improvement towards this would be to handle exceptions from dialers so that only true faults (errors) are logged as such.

@smira
Copy link
Member

smira commented May 20, 2024

@smira the kernel parameters aren't reconfigured on-the-fly, are they? These can be set to enforce single-stack IPv4, for example. I don't expect full support of all dynamic conditions, nor do I restrict the design to a single check per lifecycle (init stage). Another improvement towards this would be to handle exceptions from dialers so that only true faults (errors) are logged as such.

One can disable IPv6, but this is too much of an outlier these days. Both SideroLink and KubeSpan rely on IPv6 addressing (not connetivity), so I don't expect many people disabling IPv6.

I would rather prefer not to introduce hacks in the OS unless there's a major issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants