Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to quic-go 0.40 #207

Merged
merged 4 commits into from
Nov 17, 2023
Merged

Bump to quic-go 0.40 #207

merged 4 commits into from
Nov 17, 2023

Conversation

noahlevenson
Copy link
Contributor

This resolves the issues related to new versions of quic-go and bumps us to the newest version, which is 0.40. I'm now able to build everything using Go 1.20.11.

Basically, new versions of quic-go have introduced two nasty lil changes:

  1. They changed the signature and behavior of quic.Dial -- instead of timing out and returning an error if the handshake fails, it now obeys a context. Previously, we relied on the timeout and error behavior.
  2. They're now much more persnickety about the net.PacketConn you build your QUIC connection atop of. Previously, that net.PacketConn wasn't required to implement read or write deadlines. Now it requires correctly implemented read deadlines. So I had to make our BroflakeConn do all the SetReadDeadline stuff and obey the deadline properly. Fun times.

This must be merged alongside this change to Flashlight: getlantern/flashlight#1332

Copy link
Contributor

@myleshorton myleshorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I want to just take it for a spin to make sure the basic setup also works for me - I'll do that today. Great you got this over the line!

@myleshorton
Copy link
Contributor

The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?

@Crosse
Copy link
Contributor

Crosse commented Nov 16, 2023

I checked out this branch and it built with go version go1.21.1 linux/amd64 just fine, as well. Then I pointed lantern-cloud at this branch (with a replace statement in l-c's go.mod). Everything built and all tests passed.

(I realize this doesn't exercise the actual functionality too much, but I wanted to make both of those things known.)

@myleshorton
Copy link
Contributor

Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.

Looking good!

@noahlevenson
Copy link
Contributor Author

The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?

Cool -- bumped it to 8 here: 36fc51b

@noahlevenson
Copy link
Contributor Author

Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.

Looking good!

Fixed here: fb8392c

@noahlevenson
Copy link
Contributor Author

Just FYI, I've been giving it an aggressive poke this morning, and I was able to stimulate a panic -- so I'm gonna suss this out before merging.

AFAICT, the repro is: start a desktop client, start a widget, let them signal and establish a QUIC connection. Then disconnect the widget and let the desktop client run through its reset behaviors.

But I haven't been able to stimulate the panic every time that way...

2023/11/16 11:38:36 QUIC connection error (timeout: no recent network activity), closing!
panic: connection already exists

goroutine 2848 [running]:
github.com/quic-go/quic-go.(*connMultiplexer).AddConn(0xc00066a4a0, {0x7fcbe6deed70, 0xc00010ef00})
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/multiplexer.go:59 +0x21a
github.com/quic-go/quic-go.(*Transport).init.func1()
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:241 +0x766
sync.(*Once).doSlow(0xc0002fe248, 0xc0008f7d28)
        /home/noah/sdk/go1.20.11/src/sync/once.go:74 +0x102
sync.(*Once).Do(0xc0002fe248, 0xc0008f7f80?)
        /home/noah/sdk/go1.20.11/src/sync/once.go:65 +0x47
github.com/quic-go/quic-go.(*Transport).init(0xc0002fe1e0, 0x1)
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:200 +0x68
github.com/quic-go/quic-go.(*Transport).dial(0xc0002fe1e0, {0x1116f68, 0xc00089cde0}, {0x1114fd8, 0x110ea20}, {0x0, 0x0}, 0x19?, 0x0?, 0x0)
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:186 +0xdb
github.com/quic-go/quic-go.(*Transport).Dial(...)
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:173
github.com/quic-go/quic-go.Dial({0x1116f68, 0xc00089cde0}, {0x1119d90?, 0xc00010ef00}, {0x1114fd8, 0x110ea20}, 0xc000220180, 0xc000bce600?)
        /home/noah/go/pkg/mod/github.com/quic-go/[email protected]/client.go:117 +0x20d
github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection.func1()
        /home/noah/work/broflake/clientcore/quic.go:89 +0x119
created by github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection
        /home/noah/work/broflake/clientcore/quic.go:87 +0x3aa




@noahlevenson
Copy link
Contributor Author

The panic seems to be a canary they inserted due to breaking API changes:

https://github.com/quic-go/quic-go/blob/427f53328bb18ed1f84f352914cec805d0051494/multiplexer.go#L55

TBH, I'm kinda confused about what's going on. I think they're using the word "connection" in different ways in different places, hence their TODO to write a more descriptive panic message.

In particular, I'm not sure what "if we're already listening on this connection" means. Skimming the breaking changes notes, it seems this all has to do with reuse of the underlying net.PacketConn, so I'm assuming they're talking about listening on the net.PacketConn, which sort of makes sense.

I'll have to read all those breaking changes notes and see if I can figure out what's going on.

@noahlevenson
Copy link
Contributor Author

Alright, I think I found the explanation:

Don’t use the same net.PacketConn for multiple calls to Dial and / or Listen. Instead, create a DialListener. Since using the same connection was the previously recommended API, and will lead to pretty subtle bugs if not migrated properly, quic-go will panic if the same net.PacketConn is used for calls to Dial and / or Listen, for some transition time (we’ll remove this behavior once most users have completed the migration).

Let's see what I can do...

@noahlevenson
Copy link
Contributor Author

For those following along at home, DialListener seems to have eventually been renamed Transport...

@noahlevenson
Copy link
Contributor Author

OK, this should do it: 15c26ab

Gonna let this marinate until tomorrow morning, then I'll merge. Just want to spend a little more time poking it.

@noahlevenson noahlevenson mentioned this pull request Nov 17, 2023
@Crosse
Copy link
Contributor

Crosse commented Nov 17, 2023

Good lord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants