Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: usage of QUICResumeSession in Go 1.23rc1 breaks backwards compatibility #68124

Closed
marten-seemann opened this issue Jun 22, 2024 · 9 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@marten-seemann
Copy link
Contributor

Go version

go version go1.23rc1 (all platforms)

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/marten/Library/Caches/go-build'
GOENV='/Users/marten/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/marten/src/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/marten/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/marten/bin/go1.23ex'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/marten/bin/go1.23ex/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23rc1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/marten/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/marten/src/go/src/github.com/quic-go/quic-go/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/q0/b5ynf00142l7bl9sp8y098zr0000gn/T/go-build1369234025=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I ran quic-go's test suite using Go 1.23: quic-go/quic-go#4571.

What did you see happen?

Multiple tests related to QUIC session resumption are failing.

What did you expect to see?

As the QUIC API in crypto/tls is covered by Go's backwards compatibility guarantee, I expected the tests to pass.


#63691 introduced two new QUIC events: QUICResumeSession and QUICStoreSession. The usage of QUICStoreSession is gated by the QUICConfig.EnableStoreSessionEvent config option, but the usage of QUICResumeSession isn't.

This means that crypto/tls passes QUICResumeSession to quic-go, which doesn't know how to handle this event, and therefore aborts the handshake.

I believe this is an oversight, and that both events should have been gated by the config option. We might also want to rename the config flag to EnableSessionEvents to make it clear that it covers both QUICStoreSession and QUICResumeSession.

cc @neild @FiloSottile

@mauri870 mauri870 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 22, 2024
@mauri870
Copy link
Member

cc @neild

@neild
Copy link
Contributor

neild commented Jun 24, 2024

Oops. Thanks for the report. Will fix.

Renaming EnableStoreSessionEvent seems reasonable.

@neild
Copy link
Contributor

neild commented Jun 24, 2024

Hm. So, perhaps this is something we should change, but the oversight is one in documenting the original QUICConn.NextEvent API.

My intent was that implementations ignore events they don't recognize, precisely to permit us to add more events in the future. This probably should have been greased with the implementation providing a random event.

The reason we only have EnableStoreSessionEvent is that this event requires a response from the QUIC layer--when the event is enabled, the caller needs to call QUICConn.StoreSession to store the session, possibly after modifying it. In contrast, QUICResumeSessionEvent doesn't require a response.

But the QUICConn documentation doesn't actually say that implementations need to ignore unknown events, so maybe we should change the control to apply to resumption events as well. I think quic-go should switch to ignoring unknown events, though.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/594475 mentions this issue: crypto/tls: apply QUIC session event flag to QUICResumeSession events

@marten-seemann
Copy link
Contributor Author

My intent was that implementations ignore events they don't recognize, precisely to permit us to add more events in the future. This probably should have been greased with the implementation providing a random event.

I tried to find any record of this, but I can't seem to find anything. As a QUIC implementation, it seems dangerous to ignore events coming from the TLS stack.

Of course, crypto/tls could guarantee that it's safe to do ignore a particular event, but I'm not sure if that's practical either: Some events definitely can't be ignored, and it's unclear how the combinatorial explosion could be dealt with.

@FiloSottile
Copy link
Contributor

There are three options for APIs like this

  1. all events are critical, clients error out on unknown events, any new event must be gated by some form of opt-in
  2. some existing events are critical, but future ones are not and should be ignored if unsupported, unless the application opts-in to some new implicitly critical event
  3. events have a Critical field, unsupported critical events cause an error

Sounds like the API was meant to be (2) and was interpreted as (1).

(1) hamstrings crypto/tls and will cause a bunch of new Config options to show up.

I disagree that (2) is dangerous: it is the job of crypto/tls to make sure that new events are backwards compatible, which includes being safe to ignore. Having to ensure they are safe to ignore is still much better than not being able to add them at all!

I think (3) doesn't make much sense here, because adding a new critical event without opt-in would break backwards compatibility, so we can never do it, bringing us back to (2) effectively.

I think it's early enough that we should clarify the API as (2), and add grease. Restricting ourselves to (1) would be a pain in the long term.

@marten-seemann
Copy link
Contributor Author

Thanks for weighing in here @FiloSottile! I'm happy to change quic-go to (2), and I can do that in the next release. Should we document though that all the events defined in Go 1.22 are critical and must not be ignored?

With https://go-review.googlesource.com/c/go/+/594475, Go 1.23 won't break backwards compatibility with existing quic-go version. I assume that once we decide to add more events, current quic-go versions will have been phased out to a degree such that they won't cause any major problems.

@neild
Copy link
Contributor

neild commented Jun 25, 2024

I don't think there's anything dangerous about ignoring unknown events here. In general, the failure mode of misusing this API is the handshake failing to complete. If we add new events that can't be ignored (such as QUICStoreSession), then we need to gate them behind an opt-in to avoid breaking backwards compatibility, so an implementation can safely assume that anything it doesn't recognize is unimportant (to it).

I also don't think we need to document that the current events are critical; a QUIC implementation which ignores any just isn't going to function.

I think that we should:

  1. Apply https://go.dev/cl/594475 to avoid breaking existing quic-go releases with Go 1.23.
  2. Document that implementations must ignore unknown events.
  3. Maybe add some grease in Go 1.24 or Go 1.25, once current quic-go versions are phased out.

I don't have any anticipation that we'll be making any more changes to this API in the foreseeable future, but that should keep the door open for the unforeseen.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 25, 2024
@dmitshur dmitshur added this to the Go1.23 milestone Jun 25, 2024
Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Jul 9, 2024
Go 1.23 adds two new events to QUICConns: QUICStoreSessionEvent and
QUICResumeSessionEvent. We added a QUICConfig.EnableStoreSessionEvent
flag to control whether the store-session event is provided or not,
because receiving this event requires additional action from the caller:
the session must be explicitly stored with QUICConn.StoreSession.

We did not add a control for whether the resume-session event is
provided, because this event requires no action and the caller is
expected to ignore unknown events.

However, we never documented the expectation that callers ignore
unknown events, and quic-go produces an error when receiving an
unexpected event. So change the EnableStoreSessionEvent flag to
apply to both new events.

Fixes golang#68124
For golang#63691

Change-Id: I84af487e52b3815f7b648e09884608f8915cd645
Reviewed-on: https://go-review.googlesource.com/c/go/+/594475
Reviewed-by: Marten Seemann <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants