Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

set linger to 0 on OSX in tests #42

Merged
merged 1 commit into from
May 19, 2022
Merged

Conversation

marten-seemann
Copy link
Contributor

Fixes #33. Fixes #40.

This PR fixes the tests on OSX. The problem we were running into is that when trying to reuse an existing port in

con, err = d.DialContext(ctx, network, raddr)
if err == nil {
return con, nil
}
if reuseErrShouldRetry(err) && ctx.Err() == nil {

we'd get an error "connect: address already in use".

This was very surprising to me at first, after all SO_REUSEPORT should allow us to reuse ports. It's explained in this document:

Oddly, using SO_REUSEADDR can actually lead to more difficult "address already in use" errors. SO_REUSADDR permits you to use a port that is stuck in TIME_WAIT, but you still can not use that port to establish a connection to the last place it connected to. What? Suppose I pick local port 1010, and connect to foobar.com port 300, and then close locally, leaving that port in TIME_WAIT. I can reuse local port 1010 right away to connect to anywhere except for foobar.com port 300.

What's happening is that we're connection A <-> B first, then close that connection, and later try to connect A <-> B first. If the FIN is not properly acknowledged, the connection is still in the TIME_WAIT state, and we cannot reuse the port as described in the quote above.

The question is why this only happens on OSX, and here's where I can't really find out. My best guess is differences in Linux's and Mac's TCP stack. Using Wireshark, I could however confirm that this is what's actually going on.

Here's the trace recorded on OSX (trace_osx.pcapng.zip):
image

And here on Ubuntu (trace_ubuntu.pcapng.zip):
image

Note the ACK for the FIN, that's missing on OSX.

For libp2p, this is not a problem, as we always set linger to 0 in https://github.com/libp2p/go-libp2p/blob/39fd9a03b0946f475881ae07038014135e3021f1/p2p/transport/tcp/tcp.go#L188-L191.

@MarcoPolo
Copy link

We should probably document this in the README as well. Just a note for users of the library that they might want to set linger on macOS to 0 for this reason.

@marten-seemann
Copy link
Contributor Author

The README is not transferred by the merge script (we don't have a lot of repos with useful READMEs, and a lot of it repo-specific), but maybe we should document it in the code (such that it shows up in the Go doc).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Maybe we should just set linger to 0 everywhere?

@marten-seemann
Copy link
Contributor Author

Nice find! Maybe we should just set linger to 0 everywhere?

That makes sense, but we already set it in p2p/transport/tcp (and we still need to do it there, in case we're not using reuseport). We should avoid calling SetLinger twice, it's a syscall.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestV6V4 fails on OSX failing tests on OSX
3 participants