Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RFC 8305 (Happy Eyeballs v2) in std.net.dns #795

Closed
yorickpeterse opened this issue Jan 1, 2025 · 6 comments
Closed

Support RFC 8305 (Happy Eyeballs v2) in std.net.dns #795

yorickpeterse opened this issue Jan 1, 2025 · 6 comments
Assignees
Labels
feature New things to add to Inko, such as a new standard library module std Changes related to the standard library
Milestone

Comments

@yorickpeterse
Copy link
Collaborator

yorickpeterse commented Jan 1, 2025

Description

In commit 3d2d34b we added support for performing DNS lookups, resolving #735. While the implementation supports connecting to IPs in the order preferred by the DNS server, we don't actually implement RFC 8503 such as by limiting the amount of time spent on establishing a connection.

Related work

@yorickpeterse yorickpeterse added feature New things to add to Inko, such as a new standard library module std Changes related to the standard library labels Jan 1, 2025
@yorickpeterse
Copy link
Collaborator Author

We can implement this by spawning one process per IP address, then have them use e.g. TcpClient.with_timeout() to connect with a timeout and send the results over a Channel. The first Result.Ok(socket) is then returned to the user.

The problem is that we don't have a way to cancel an existing connection attempt, so if we have 3 sockets with a timeout of 100 msec and one succeeds, the others will continue to try and connect until they time out.

@yorickpeterse yorickpeterse added this to the 0.18.0 milestone Jan 1, 2025
@yorickpeterse yorickpeterse self-assigned this Jan 1, 2025
@yorickpeterse
Copy link
Collaborator Author

Reading through the RFC more, it seems we technically don't need to do anything in parallel:

Once the list of addresses received up to this point has been constructed, the client will attempt to make connections. In order to avoid unreasonable network load, connection attempts SHOULD NOT be made simultaneously. Instead, one connection attempt to a single address is started first, followed by the others in the list, one at a time.

So if I'm understanding it correctly, combined with interleaving IPs (V6 -> V4 -> V6, etc), it seems to just come down to this:

ips.iter.each(fn (ip) { connect(ip, timeout_after: Duration.from_millis(250)) })

This does mean that if you have e.g. 4 IPs and they all time out, you'd have to wait 1 second in total, whereas a parallel connection setup would only require 250 msec in the worst case. The downside of parallel connections is that this can easily lead to network congestion, especially if you end up with M processes trying to make N parallel connections, where either M or N is large (or both are).

@yorickpeterse
Copy link
Collaborator Author

Small correction: it seems that per the RFC, the initial attempts are made sequentially but we still wait for any of those connections to produce a socket. That is, we do something like this:

for ip in ips
  async connect(ip, channel) with 1 second timeout
  if connection = receive(channel) with 250 msec timeout
    return connection

wait for remaining connections to see if any succeeded

yorickpeterse added a commit that referenced this issue Jan 2, 2025
This fixes #795.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 2, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
@yorickpeterse yorickpeterse changed the title Support RFC 8503 (Happy Eyeballs v2) in std.net.dns Support RFC 8305 (Happy Eyeballs v2) in std.net.dns Jan 2, 2025
yorickpeterse added a commit that referenced this issue Jan 2, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 3, 2025

Work on this is taking place in the happy-eyeballs branch. This revealed some issues with Promise.set silently dropping values, and thus Chanel losing out on some.

That case has been fixed, but I think there may be another similar case: if a Future.get_until times out and just after that we resolve it using Promise.set (e.g. the timeout occurs at 250 msec and the write occurs at 251 msec), then Channel.receive_until discards the Future and the value along with it, resulting in the loss of the value:

Thread A       | Thread B
---------------+------------------
lock(future)   |                      |
wait(timeout)  |                      | Timeline
timed out      |                      |
               | resolve(promise)     V
drop(future)   |

Future.get_until is technically correct here and does account for this by returning Result[T, Future[T]], allowing you to reuse the future if necessary. The problem is that we can't tell the difference between a Future that timed out just this once, and a Future that will forever time out because its Promise got dropped. That in turn means we can't reliably determine if the Future can be reused or not.

To fix this, Future.get_until needs to return a dedicated type that can represent these states:

  1. We received a value
  2. We timed out, but the Promise is still connected and thus the Future can be reused
  3. We timed out and the Promise is dropped, so the Future should be discarded

In theory we could represent this using Result[T, Option[Future[T]] but that doesn't clearly express the intent of the type, so a dedicated type would be better.

With this in place, Channel can then use this result to determine if it should reuse the Future on the next call to Channel.receive or if it should discard it and create a new one.

@yorickpeterse
Copy link
Collaborator Author

There's another potential issue with the above: if we start storing reusable Future values in a Channel, it's not clear what Channel.clone should do with this. We can't clone a Future, but not sharing the list somehow would result in one Channel being able to receive a value while the other can't because it doesn't have the same Future.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 3, 2025

One option is that Future.get_until in the event of a timeout always consumes itself and immediately disconnects itself in the event of a timeout. That is, the timeout logic is basically this:

  1. Wait for the timeout. If we get a value before it expires, return it as usual
  2. If the timeout expires, re-lock the Future and check if there's a value.
  3. If so, return the value
  4. If no value is present at this point, disconnect the Future
  5. Unlock the future

Using this approach a Future can only be resolved once using Future.get_until and we don't have to bother with storing it for later reuse. This does require that we change disconnected from a Bool to a UInt8 with three states:

  • 0: both sides are connected
  • 1: Future disconnected, Promise should clean up the shared state
  • 2: Promise disconnected, Future should clean up the shared state

When calling Future.drop, it checks this state. If 0, it simply sets it to 1. If the value is already 1 we leave the shared state as-is.

When calling Promise.drop, it too checks this state. If 0, it simply sets it to 2, otherwise it cleans up the shared state.

Future.get_until in turn sets the status to 1 upon a timeout. This should ensure we don't encounter any double drops and no longer silently ignore values on the Future side of things.

yorickpeterse added a commit that referenced this issue Jan 3, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 7, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 7, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 8, 2025
Future.get_until now always consumes self and returns an Option[uni T]
instead of Result[uni T, Future[T]]. The reason for this is that the use
of a timeout makes this method very racy and it's difficult to
efficiently handle cases where an Error is returned. For example, if a
value is written to a Promise _just_ after Future.get_until returns, one
would have to resort to a busy loop to observe the value at some point.

Consuming self in Future.get_until means that the Future is dropped upon
a timeout, disconnecting its Promise and forcing calls to Promise.set to
either discard the value or retry the operation in some way (e.g. by
using a new Future/Promise pair).

For more details, refer to
#795 (comment)
and the comments that follow it.

Changelog: changed
yorickpeterse added a commit that referenced this issue Jan 8, 2025
This changes TcpClient.new and TcpClient.with_timeout to support
connecting to multiple IP addresses based on RFC 8305, also known as
"Happy Eyeballs version 2".

In addition, TcpClient.new now uses a default timeout of 60 seconds
instead of waiting forever, and std.time.Duration is extended with a few
extra methods such as Duration.positive? and Duration./.

This fixes #795.

Changelog: added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New things to add to Inko, such as a new standard library module std Changes related to the standard library
Projects
None yet
Development

No branches or pull requests

1 participant