Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't silence errors from quinn-udp code #1971

Closed
thomaseizinger opened this issue Aug 21, 2024 · 19 comments · Fixed by #2017
Closed

Don't silence errors from quinn-udp code #1971

thomaseizinger opened this issue Aug 21, 2024 · 19 comments · Fixed by #2017

Comments

@thomaseizinger
Copy link
Contributor

Currently, quinn-udp logs certain errors and returns Ok(()) even if the packet did not get sent. For example, a common one we see is

2024-08-21T00:34:51.104902Z  WARN quinn_udp: sendmsg error: Os { code: 101, kind: NetworkUnreachable, message: "Network unreachable" }, Transmit: { destination: [2600:1900:40b0:1504:0:94::]:3478, src_ip: None, enc: None, len: 108, segment_size: None }

This happens when we have an IPv6 interface but only with a link-local address. Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

I would like to catch this in my application and disable the socket but I cannot because quinn-udp returns Ok(()).

Would you be willing to change this? It appears that ignoring a send error within library code is too opinionated and perhaps a layer above (quinn itself) would be better suited for this.

Note that I am not using quinn, we only use quinn-udp as a socket library.

Related: #1750.

@Ralith
Copy link
Collaborator

Ralith commented Aug 21, 2024

I'm not sure exposing more errors is useful. The motivation behind the current behavior is that UDP network errors are almost invariably one of two cases: those that can be trivially spoofed by an attacker, and those that may be transient. For example, do you want to disable your socket just because the user's wifi drops out for a moment, or they changed networks?

@thomaseizinger
Copy link
Contributor Author

I'm not sure exposing more errors is useful. The motivation behind the current behavior is that UDP network errors are almost invariably one of two cases: those that can be trivially spoofed by an attacker, and those that may be transient. For example, do you want to disable your socket just because the user's wifi drops out for a moment, or they changed networks?

We don't hard-fail in our app for any IO errors for exactly that reason :)

As far as I know, NetworkReachable gets emitted if the socket is incapable of routing the packet at all. That appears to be permanent (until network connections change at which point we make a new socket).

I am test-driving a change in a fork right now. I can report how that is going in a couple of days!

In general though, I think it wouldn't hurt to expose the errors at this layer. It gives more flexibility to users which makes it IMO a better library.

@Ralith
Copy link
Collaborator

Ralith commented Aug 22, 2024

permanent (until network connections change at which point we make a new socket).

"Permanent until it changes" isn't very permanent. Further, if your socket is bound to the wildcard address, as typical, there's no point recreating it in response to network changes. Detecting network changes is also difficult, especially in portable code.

It gives more flexibility to users

There needs to be a concrete use case that outweighs the footgun and which doesn't have a similar or better solution elsewhere.

Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

Interpreting this as a fatal condition in particular seems like a mistake, because "any kind of packet" isn't quite right. What if someone wants to communicate with a link-local peer?

@thomaseizinger
Copy link
Contributor Author

Further, if your socket is bound to the wildcard address, as typical, there's no point recreating it in response to network changes.

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

Thus, binding the socket to the unspecified address works but sending any kind of packet fails.

Interpreting this as a fatal condition in particular seems like a mistake, because "any kind of packet" isn't quite right. What if someone wants to communicate with a link-local peer?

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It gives more flexibility to users

There needs to be a concrete use case that outweighs the footgun and which doesn't have a similar or better solution elsewhere.

I agree with you in principle but what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

If anything, I'd consider the decision to not expose errors a footgun because it deviates from what tokio's and std's UDP socket do.

@thomaseizinger
Copy link
Contributor Author

To be clear, if we decide to change this, then I'll obviously move the current behaviour up into quinn so for those users, nothing changes.

@Ralith
Copy link
Collaborator

Ralith commented Aug 22, 2024

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

What, really? Is that documented anywhere? That makes it much harder to make a well-behaved client application...

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

I'm not sure that's true. It wasn't obvious to us when we started Quinn, and it will certainly be a surprise to any existing users of quinn-udp.

@thomaseizinger
Copy link
Contributor Author

There is. On MacOS, a socket bound to the wildcard address will not be able to use newly attached interfaces.

What, really? Is that documented anywhere? That makes it much harder to make a well-behaved client application...

Yep, welcome to the world of cross-platform networking 🙃

I'd have to dig through our issues to see if I can find something but that is the behaviour we observed. Hence our application rebinds all sockets when the OS notifies us about a network change event.

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

Fair enough. This kind of is a case for exposing the errors though. An application that needs to talk to link-local peers and to the wider network would probably want to know that most of its packets are failing to send. In any case, such an application will have strictly more context to make a good decision on how to handle errors compared to the socket library.

what is the footgun that you are seeing here? Anybody building applications on top of UDP will be aware that failure to send an individual packet isn't necessarily fatal to their application.

I'm not sure that's true. It wasn't obvious to us when we started Quinn, and it will certainly be a surprise to any existing users of quinn-udp.

Communicating such a change can easily be done by breaking / deprecating the API purposely and thus causing compile errors for all existing users.

I am not sure what else to argue 🤷‍♂️

  • The only error currently being returned from send is WouldBlock. This isn't what tokio and std do which is surprising to say the least. It is also not communicated in the type-signature either because it uses an io::Error. At the very least, the type-signature of send should be changing to a custom type that makes it obvious that all errors are being silenced.
  • Logging errors is an error handling strategy. It may not be the appropriate error handling strategy for each application. IMO a library shouldn't ship an error handling strategy for something like sending packets unless it can transparently fulfill the user's request (like sending the packet eventually).

@thomaseizinger
Copy link
Contributor Author

If there is a route (to a link-local peer), then the syscall isn't going to fail with NetworkUnreachable, right?

It will if you try to communicate with someone else concurrently, at least.

This is a good point. I think the error handling of NetworkUnreachable needs to be a bit more fine-granular. Instead of discarding the socket, only this particular datagram should be marked as lost. In our case, we need to detect if we are unable to communicate with a certain DNS server.

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

@Ralith
Copy link
Collaborator

Ralith commented Sep 2, 2024

Communicating such a change can easily be done by breaking / deprecating the API purposely and thus causing compile errors for all existing users.

This addresses my concern, though it'll be a shame to have to move away from the nice clear send/recv names...

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

In the spirit of unlocking at least hypothetically better application behavior, I agree that this would be nice to have. Happy to review a PR.

@thomaseizinger
Copy link
Contributor Author

Being able to discard that server immediately gives a much better UX than having to rely on a (series of) timeouts before it is marked as unresponsive.

In the spirit of unlocking at least hypothetically better application behavior, I agree that this would be nice to have. Happy to review a PR.

Just to be clear, the scope we are talking about is:

  • Simply forward io::Errors from send
  • Move the current logging behaviour into quinn

@Ralith
Copy link
Collaborator

Ralith commented Sep 2, 2024

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

@thomaseizinger
Copy link
Contributor Author

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

send_fallible?

@Ralith
Copy link
Collaborator

Ralith commented Sep 3, 2024

Sure. And presumably a recv version, too?

@thomaseizinger
Copy link
Contributor Author

Sure. And presumably a recv version, too?

I hadn't looked at recv at all. What can fail in receive that we currently don't expose? Doesn't recv only fail if the FD is closed?

@djc
Copy link
Member

djc commented Sep 3, 2024

To avoid unpleasant surprises for downstream, I think should introduce a new function rather than drastically changing the semantics of the current one. Open to other ideas, though.

send_fallible?

try_send() feels more idiomatic.

@Ralith
Copy link
Collaborator

Ralith commented Sep 3, 2024

What can fail in receive that we currently don't expose?

Actually, looks like we don't suppress recv errors at all, so probably no changes needed there.

@thomaseizinger
Copy link
Contributor Author

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

@djc
Copy link
Member

djc commented Sep 3, 2024

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

If we're going to add try_send() there don't seem to be good reasons to break compatibility for the crate, so I'm inclined to avoid this until we have other changes that need a semver-incompatible bump.

@thomaseizinger
Copy link
Contributor Author

Should I change the error type of send at the same time to return a custom error that makes it obvious that it only ever "fails" on WouldBlock?

If we're going to add try_send() there don't seem to be good reasons to break compatibility for the crate, so I'm inclined to avoid this until we have other changes that need a semver-incompatible bump.

The current type-signature is very misleading because it is too "wide" and at the same time very common for networking code, meaning pretty much everyone using this API will assume that most io::Errors will be returned here. IMO, this unconventional choice warrants drawing attention to. The fix is easy, users can just map_err to an io::ErrorKind::WouldBlock. We can provide a dedicated conversion function for that too.

impl WouldBlock {
	pub fn into_io_error(self) -> std::io::Error {
		std::io::ErrorKind::WouldBlock.into()
	}
}

Most users will use this with something like tokio::net::UdpSocket::try_io which expects an io::ErrorKind::WouldBlock. That is only gonna happen in a single place though so it is not like the map_err will duplicated a lot.

self.inner.try_io(Interest::WRITABLE, || {
    self.state.send((&self.inner).into(), &transmit).map_err(quinn_udp::WouldBlock::into_io_error)
})

We can provide a From-impl too but that would somewhat miss point of the custom error because the conversion might get hidden behind ?.

I am busy with other things this week but I am hoping to be able to send a PR soon after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants