Don't silence errors from `quinn-udp` code #1971

thomaseizinger · 2024-08-21T00:42:55Z

Currently, quinn-udp logs certain errors and returns Ok(()) even if the packet did not get sent. For example, a common one we see is

2024-08-21T00:34:51.104902Z  WARN quinn_udp: sendmsg error: Os { code: 101, kind: NetworkUnreachable, message: "Network unreachable" }, Transmit: { destination: [2600:1900:40b0:1504:0:94::]:3478, src_ip: None, enc: None, len: 108, segment_size: None }

This happens when we have an IPv6 interface but only with a link-local address. Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

I would like to catch this in my application and disable the socket but I cannot because quinn-udp returns Ok(()).

Would you be willing to change this? It appears that ignoring a send error within library code is too opinionated and perhaps a layer above (quinn itself) would be better suited for this.

Note that I am not using quinn, we only use quinn-udp as a socket library.

Related: #1750.

The text was updated successfully, but these errors were encountered:

Ralith · 2024-08-21T03:37:49Z

I'm not sure exposing more errors is useful. The motivation behind the current behavior is that UDP network errors are almost invariably one of two cases: those that can be trivially spoofed by an attacker, and those that may be transient. For example, do you want to disable your socket just because the user's wifi drops out for a moment, or they changed networks?

thomaseizinger · 2024-08-21T06:27:12Z

I'm not sure exposing more errors is useful. The motivation behind the current behavior is that UDP network errors are almost invariably one of two cases: those that can be trivially spoofed by an attacker, and those that may be transient. For example, do you want to disable your socket just because the user's wifi drops out for a moment, or they changed networks?

We don't hard-fail in our app for any IO errors for exactly that reason :)

As far as I know, NetworkReachable gets emitted if the socket is incapable of routing the packet at all. That appears to be permanent (until network connections change at which point we make a new socket).

I am test-driving a change in a fork right now. I can report how that is going in a couple of days!

In general though, I think it wouldn't hurt to expose the errors at this layer. It gives more flexibility to users which makes it IMO a better library.

Ralith · 2024-08-22T03:10:45Z

permanent (until network connections change at which point we make a new socket).

"Permanent until it changes" isn't very permanent. Further, if your socket is bound to the wildcard address, as typical, there's no point recreating it in response to network changes. Detecting network changes is also difficult, especially in portable code.

It gives more flexibility to users

There needs to be a concrete use case that outweighs the footgun and which doesn't have a similar or better solution elsewhere.

Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

Interpreting this as a fatal condition in particular seems like a mistake, because "any kind of packet" isn't quite right. What if someone wants to communicate with a link-local peer?

thomaseizinger · 2024-08-22T03:50:34Z

Further, if your socket is bound to the wildcard address, as typical, there's no point recreating it in response to network changes.

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

Interpreting this as a fatal condition in particular seems like a mistake, because "any kind of packet" isn't quite right. What if someone wants to communicate with a link-local peer?

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It gives more flexibility to users

There needs to be a concrete use case that outweighs the footgun and which doesn't have a similar or better solution elsewhere.

I agree with you in principle but what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

If anything, I'd consider the decision to not expose errors a footgun because it deviates from what tokio's and std's UDP socket do.

thomaseizinger · 2024-08-22T03:54:10Z

To be clear, if we decide to change this, then I'll obviously move the current behaviour up into quinn so for those users, nothing changes.

Ralith · 2024-08-22T05:58:35Z

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

What, really? Is that documented anywhere? That makes it much harder to make a well-behaved client application...

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

I'm not sure that's true. It wasn't obvious to us when we started Quinn, and it will certainly be a surprise to any existing users of quinn-udp.

thomaseizinger · 2024-08-22T06:34:49Z

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

What, really? Is that documented anywhere? That makes it much harder to make a well-behaved client application...

Yep, welcome to the world of cross-platform networking 🙃

I'd have to dig through our issues to see if I can find something but that is the behaviour we observed. Hence our application rebinds all sockets when the OS notifies us about a network change event.

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

Fair enough. This kind of is a case for exposing the errors though. An application that needs to talk to link-local peers and to the wider network would probably want to know that most of its packets are failing to send. In any case, such an application will have strictly more context to make a good decision on how to handle errors compared to the socket library.

what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

I'm not sure that's true. It wasn't obvious to us when we started Quinn, and it will certainly be a surprise to any existing users of quinn-udp.

Communicating such a change can easily be done by breaking / deprecating the API purposely and thus causing compile errors for all existing users.

I am not sure what else to argue 🤷‍♂️

The only error currently being returned from send is WouldBlock. This isn't what tokio and std do which is surprising to say the least. It is also not communicated in the type-signature either because it uses an io::Error. At the very least, the type-signature of send should be changing to a custom type that makes it obvious that all errors are being silenced.
Logging errors is an error handling strategy. It may not be the appropriate error handling strategy for each application. IMO a library shouldn't ship an error handling strategy for something like sending packets unless it can transparently fulfill the user's request (like sending the packet eventually).

thomaseizinger · 2024-08-22T07:02:44Z

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

This is a good point. I think the error handling of NetworkUnreachable needs to be a bit more fine-granular. Instead of discarding the socket, only this particular datagram should be marked as lost. In our case, we need to detect if we are unable to communicate with a certain DNS server.

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

Ralith · 2024-09-02T16:26:48Z

Communicating such a change can easily be done by breaking / deprecating the API purposely and thus causing compile errors for all existing users.

This addresses my concern, though it'll be a shame to have to move away from the nice clear send/recv names...

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

In the spirit of unlocking at least hypothetically better application behavior, I agree that this would be nice to have. Happy to review a PR.

thomaseizinger · 2024-09-02T18:15:17Z

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

In the spirit of unlocking at least hypothetically better application behavior, I agree that this would be nice to have. Happy to review a PR.

Just to be clear, the scope we are talking about is:

Simply forward io::Errors from send
Move the current logging behaviour into quinn

Ralith · 2024-09-02T18:24:00Z

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

thomaseizinger · 2024-09-03T00:16:48Z

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

send_fallible?

Ralith · 2024-09-03T00:28:03Z

Sure. And presumably a recv version, too?

thomaseizinger · 2024-09-03T01:23:23Z

Sure. And presumably a recv version, too?

I hadn't looked at recv at all. What can fail in receive that we currently don't expose? Doesn't recv only fail if the FD is closed?

djc · 2024-09-03T01:36:03Z

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

send_fallible?

try_send() feels more idiomatic.

Ralith · 2024-09-03T01:58:09Z

What can fail in receive that we currently don't expose?

Actually, looks like we don't suppress recv errors at all, so probably no changes needed there.

thomaseizinger · 2024-09-03T04:09:29Z

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

djc · 2024-09-03T08:10:21Z

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

If we're going to add try_send() there don't seem to be good reasons to break compatibility for the crate, so I'm inclined to avoid this until we have other changes that need a semver-incompatible bump.

thomaseizinger · 2024-09-03T18:24:31Z

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

If we're going to add try_send() there don't seem to be good reasons to break compatibility for the crate, so I'm inclined to avoid this until we have other changes that need a semver-incompatible bump.

The current type-signature is very misleading because it is too "wide" and at the same time very common for networking code, meaning pretty much everyone using this API will assume that most io::Errors will be returned here. IMO, this unconventional choice warrants drawing attention to. The fix is easy, users can just map_err to an io::ErrorKind::WouldBlock. We can provide a dedicated conversion function for that too.

impl WouldBlock {
	pub fn into_io_error(self) -> std::io::Error {
		std::io::ErrorKind::WouldBlock.into()
	}
}

Most users will use this with something like tokio::net::UdpSocket::try_io which expects an io::ErrorKind::WouldBlock. That is only gonna happen in a single place though so it is not like the map_err will duplicated a lot.

self.inner.try_io(Interest::WRITABLE, || {
    self.state.send((&self.inner).into(), &transmit).map_err(quinn_udp::WouldBlock::into_io_error)
})

We can provide a From-impl too but that would somewhat miss point of the custom error because the conversion might get hidden behind ?.

I am busy with other things this week but I am hoping to be able to send a PR soon after.

This was referenced Aug 21, 2024

Resolve A or AAAA depending on Gateway's IPv4 and IPv6 connectivity firezone/firezone#6371

Closed

feat(connlib): discard sockets on NetworkUnreachable error firezone/firezone#6382

Closed

mxinden mentioned this issue Sep 23, 2024

feat: Faster UDP/IO on Apple platforms #1993

Merged

thomaseizinger mentioned this issue Oct 10, 2024

Don't use IPv6 DNS upstreams when there's no IPv6 connectivity firezone/firezone#6353

Closed

thomaseizinger mentioned this issue Oct 21, 2024

feat: introduce UdpSocketState::try_send APIs #2017

Merged

Ralith closed this as completed in #2017 Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't silence errors from `quinn-udp` code #1971

Don't silence errors from `quinn-udp` code #1971

thomaseizinger commented Aug 21, 2024

Ralith commented Aug 21, 2024

thomaseizinger commented Aug 21, 2024

Ralith commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

Ralith commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

Ralith commented Sep 2, 2024

thomaseizinger commented Sep 2, 2024

Ralith commented Sep 2, 2024

thomaseizinger commented Sep 3, 2024

Ralith commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

djc commented Sep 3, 2024 •

edited

Loading

Ralith commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

djc commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

Don't silence errors from quinn-udp code #1971

Don't silence errors from quinn-udp code #1971

Comments

thomaseizinger commented Aug 21, 2024

Ralith commented Aug 21, 2024

thomaseizinger commented Aug 21, 2024

Ralith commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

Ralith commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

thomaseizinger commented Aug 22, 2024

Ralith commented Sep 2, 2024

thomaseizinger commented Sep 2, 2024

Ralith commented Sep 2, 2024

thomaseizinger commented Sep 3, 2024

Ralith commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

djc commented Sep 3, 2024 • edited Loading

Ralith commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

djc commented Sep 3, 2024

thomaseizinger commented Sep 3, 2024

Don't silence errors from `quinn-udp` code #1971

Don't silence errors from `quinn-udp` code #1971

djc commented Sep 3, 2024 •

edited

Loading