Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS handshake failing on Windows with 2.13.0 #2294

Closed
augi opened this issue Sep 5, 2024 · 19 comments
Closed

TLS handshake failing on Windows with 2.13.0 #2294

augi opened this issue Sep 5, 2024 · 19 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@augi
Copy link

augi commented Sep 5, 2024

Bug Description

After upgrading from 2.12.0 to 2.13.0, we are getting TLS handshake failed: EOF error with the Windows executable (x64).

Example code (or command)

cloud-sql-proxy-2.13.0 --auto-iam-authn --private-ip  --impersonate-service-account "[email protected]" "redacted:europe-west1:redacted?port=15434" --run-connection-test --debug-logs

Stacktrace

TLS handshake failed: EOF
Connection info refresh operation started
Connection test failed
 proxy server error: Dial error: handshake failed (connection name = "redacted"): EOF
The proxy has encountered a terminal error: Dial error: handshake failed (connection name = "redacted"): EOF


### Steps to reproduce?

1. Use a similar command with 2.12.0 (or older) - works well.
2. Use 2.13.0 - crashes on the error.


### Environment

1. OS type and version: Windows 10 Enterprise
2. Cloud SQL Proxy version (`./cloud-sql-proxy --version`): cloud-sql-proxy version 2.13.0+windows.amd64
3. Proxy invocation command (for example, `./cloud-sql-proxy --port 5432 INSTANCE_CONNECTION_NAME`): cloud-sql-proxy-2.13.0 --auto-iam-authn --private-ip  --impersonate-service-account "[email protected]" "redacted:europe-west1:redacted?port=15434" --run-connection-test --debug-logs


### Additional Details

_No response_
@augi augi added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Sep 5, 2024
@enocom
Copy link
Member

enocom commented Sep 5, 2024

If you omit the --run-connection-test, do you see any different behavior?

@augi
Copy link
Author

augi commented Sep 6, 2024

No. Without this parameter, the proxy starts, but fails on the same errors when trying to establish the first connection.

@jackwotherspoon
Copy link
Collaborator

@augi Are you seeing the error on every invocation or is it intermittent?

I will try and give the windows binary a try.

@jackwotherspoon jackwotherspoon added priority: p2 Moderately-important priority. Fix may not be included in next release. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Sep 6, 2024
@augi
Copy link
Author

augi commented Sep 6, 2024

We can always see the error. We tried a few latest versions, and the corrupted one is just the last one.

@jackwotherspoon
Copy link
Collaborator

Going to bump this down while I investigate as our integration tests are all passing for Windows currently.

@jackwotherspoon jackwotherspoon added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Sep 6, 2024
@augi
Copy link
Author

augi commented Sep 6, 2024

Isn't it possible there is a request made to a different host/port with the latest proxy version? We are in a very restrictive environment, and I can see EOF-type of errors when an unexpected endpoint is contacted (and so forbidden by a firewall or other network device).

@enocom
Copy link
Member

enocom commented Sep 6, 2024

That's a good idea and my own hypothesis given we don't see any test failures elsewhere.

The Proxy will be dialing <Private IP>:3307. I believe that address should be available in the Proxy debug logs as well.

@enocom
Copy link
Member

enocom commented Sep 6, 2024

But of course that doesn't explain why the last versions works without issue...

@augi
Copy link
Author

augi commented Sep 6, 2024

The Proxy will be dialing <Private IP>:3307. I believe that address should be available in the Proxy debug logs as well.

Yes, it is present there.

Isn't it possible that there was a change in an upstream dependency that changed the behavior?

@enocom
Copy link
Member

enocom commented Sep 6, 2024

We use Go's TLS library exclusively, so unless there's another detail we're not finding here, this is either a problem with Go's TLS windows implementation or something about your environment.

@enocom
Copy link
Member

enocom commented Sep 6, 2024

@jackwotherspoon will try to manually reproduce out of due diligence and then we'll go from there.

@jackwotherspoon
Copy link
Collaborator

@augi I tested and had no issue with the v2.13.0 x64 Windows Proxy build...

Ran the following command:

cloud-sql-proxy.exe my-project:us-central1:my-instance?port=5432 --run-connection-test

And got the following expected output:

2024/09/08 09:58:56 Authorizing with Application Default Credentials
2024/09/08 09:58:57 [my-project:us-central1:my-instance] Listening on 127.0.0.1:5432
2024/09/08 09:58:57 The proxy has started successfully and is ready for new connections!
2024/09/08 09:58:57 Connection test started
2024/09/08 09:58:57 Connection test passed

This makes me think something has maybe changed in your environment that is unexpectedly causing the new proxy version to fail?

All our latest release changed was bumping to Go 1.23, so I will double-check that nothing with Go's TLS library on Windows changed in the new Go version.

@augi
Copy link
Author

augi commented Sep 8, 2024

We tested multiple versions in the same environment, and only 2.13.0 version demonstrates this issue 🙏

@enocom
Copy link
Member

enocom commented Sep 9, 2024

@augi have you tried doing a tcpdump on the traffic?

High level the proxy does this:

  1. Opens a TCP socket to the remote instance
  2. Initiates the TLS handshake using TLS 1.3

Judging from the error you're seeing, the TCP socket is connected, but something about the handshake is failing. https://tls13.xargs.org/ is a good reference.

We'll keep digging, but at this point we have integration tests passing for Windows and a manual test that also passes, so we have few leads.

@jackwotherspoon
Copy link
Collaborator

Worth noting I tested on a Windows 11 Enterprise machine

@ricardohbin
Copy link

@augi I was facing this issue in version 2.13.0, using the mac arm64 version, inside our company's vpn

looking at the changelog of go version 1.22..1.23, I got this change: experimental post-quantum key exchange mechanism X25519Kyber768Draft00 is now enabled by default when [Config.CurvePreferences](https://tip.golang.org/pkg/crypto/tls#Config.CurvePreferences) is nil. The default can be reverted by adding tlskyber=0 to the GODEBUG environment variable.

disabling it, using GODEBUG=tlskyber=0 cloud-sql-proxy-2.13.0 ... make cloud-sql-proxy works again inside the vpn.

I hope this helps!

more details here: golang/go#67061

@enocom
Copy link
Member

enocom commented Sep 10, 2024

Nice find @ricardohbin! Thanks for posting here.

@augi
Copy link
Author

augi commented Sep 11, 2024

Good catch, I can confirm that set GODEBUG=tlskyber=0 fixed the usage on Windows.

@jackwotherspoon
Copy link
Collaborator

Thanks @augi for confirming! Will close this issue as a fix has been found 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants