-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s2a: Add gRPC S2A #11113
s2a: Add gRPC S2A #11113
Conversation
e83fee8
to
1991e37
Compare
Thanks @rmehta19. Can you PTAL at the test failures? |
604f9a0
to
3c867c9
Compare
@matthewstevenson88, I made 2 changes and it looks like 2/3 linux runs are passing. tests(11) is failing with an error in code that was not affected by this PR, so I don't think that failure is related. The changes:
|
b7b51bc
to
73cc37a
Compare
Done -- thank you for the review @matthewstevenson88! |
s2a/src/main/java/io/grpc/s2a/handshaker/ConnectionIsClosedException.java
Outdated
Show resolved
Hide resolved
s2a/src/main/java/io/grpc/s2a/handshaker/tokenmanager/SingleTokenFetcher.java
Outdated
Show resolved
Hide resolved
s2a/src/test/java/io/grpc/s2a/handshaker/GetAuthenticationMechanismsTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, s2a is the public API, and s2a/channel and s2a/handshaker are internal APIs. Are those internal APIs to be used anywhere, even within Google?
Thanks for the review @larry-safran ! Please let me know if there is anything else to address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some random things I noticed.
s2a/src/main/java/io/grpc/s2a/handshaker/S2AProtocolNegotiatorFactory.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More random stuff. This is mostly from just skimming randomly and noticing unexpected code shapes.
subjectAltName = @alt_names | ||
|
||
[alt_names] | ||
IP.1 = :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here? How would this ever work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that the generated certs have localhost IP addr, so that certificate verification passes in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't localhost ::1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. I ran a local test modifying the IP to be ::1
and regenerating these certs, confirming our tests still pass. I'll modify this in a separate PR along with the other comments on the EC cert data: #11540 (comment) .
I'll also update our go client tests for consistency: https://github.com/google/s2a-go/blob/main/testdata/config.cnf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7a879a2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion and thanks for pointing out. To be clear, out tests rely on the fact that Common Name is set to localhost (not actually in this config, but when this config is used to create the CSR, it is set. I noted this in the README). So that's why this field was left as ::, and had no effect on tests when set to ::1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let me repeat my initial question: What is happening here? What is it doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the config to generate certs to be used for client & S2A when testing mTLS to S2A. This is used in an integration test (also in S2AChannelCredentialsTest unit test, but that's just a unit test, no channel actually gets established between the client and S2A). As part of mTLS peer cert verification process, it is required that certificate CommonName matches the hostname (which is localhost).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. This is alt_names aka SAN, which is not CommonName. And ::
is not ::1
and that would not validate if the client used "localhost". So that all seems irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As I said before, as a hint for integration tests, don't put localhost in the cert, which is actually invalid. Instead, use managedChannelBuilder.overrideAuthority()
. It is available in all languages; WithAuthority()
in Go.)
s2a/src/main/java/io/grpc/s2a/handshaker/tokenmanager/AccessTokenManager.java
Show resolved
Hide resolved
|
||
Channel ch = channelPool.getChannel(); | ||
S2AServiceGrpc.S2AServiceStub stub = S2AServiceGrpc.newStub(ch); | ||
S2AStub s2aStub = S2AStub.newInstance(stub); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Stub is never closed. This causes IntegrationTest (after Larry's reordering of awaitTermination) to not complete tests quickly, as the RPC is outstanding so awaitTermination() hangs. I think once we shut down the top-level channel that releases the S2AHandshakerServiceChannel.ChannelResource after 1 second, which then calls shutdownNow() on the s2a channel which cancels the RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we never call close on the stub directly. When the
close gets invoked, everything you mentioned happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not okay. That means we leak an RPC per connection for the life of all s2a channels, which is probably the lifetime of the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would closing the stub in close resolve the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It resolves "never" but that is still busted. The stub needs to be closed when it is no longer needed. That could be when the future completes (e.g., sslContextFuture.addListener(s2aStub::close, service)
). But I don't see a reason to split up the lifetimes between the threads like this. It seems clearer to:
@Override
protected void handlerAdded0(ChannelHandlerContext ctx) {
// Buffer all reads until the TLS Handler is added.
BufferReadsHandler bufferReads = new BufferReadsHandler();
ctx.pipeline().addBefore(ctx.name(), /* name= */ null, bufferReads);
ListenableFuture<SslContext> sslContextFuture = service.submit(this::createSslContext);
...
}
private SslContext createSslContext() {
Channel ch = channelPool.getChannel();
try (S2AStub s2aStub = S2AStub.newInstance(S2AServiceGrpc.newStub(ch))) {
return SslContextFactory.createForClient(s2aStub, hostname, localIdentity);
} finally {
channelPool.returnToPool(ch);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channel is still needed after the SslContext future is complete (we make RPCs to S2A to verify the peer, and perform private key operations during the handshake). The SslContextBuilder
configures the trustManager and PrivateKeyMethod with the stub. In this example, the channel gets returned to the pool before the handshake is complete, which causes an error when S2ATrustManager uses the stub to verify the peer cert.
The stub is no longer needed when the handshake is complete. A probably not great option would be to pass around the channel instead, and create a new stub for every RPC? Or perhaps we plumb the stub down to the ClientTlsProtocolNegotiator and close it when the handshake is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejona86, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. A new stub for every RPC is fine. I thought y'all needed to use the same streaming RPC to preserve context. If separate RPCs are fine, that is definitely superior.
We don't want to inject into the handshake completion code itself, but that code is triggering a ProtocolNegotiationEvent. We can use that.
private static final class S2AStubCleanupNegotiationHandler extends ProtocolNegotiationHandler {
private final s2aStub; // pass in constructor
@Override
protected void protocolNegotiationEventTriggered(ChannelHandlerContext ctx) {
s2aStub.close();
fireProtocolNegotiationEvent(ctx);
ctx.pipeline().remove(ctx);
}
}
And then in S2aProtocolNegotiationHandler, ctx.pipeline().addAfter()
that S2AStubCleanupNegotiationHandler
before adding the tls handler. (So the TLS handler will be between that cleanup handler and the s2a handler.)
We could do alternatives like adding the tls handler before the s2a handler, but they take a bit more reorganization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Eric! I've implemented what you mention in #11600
I initially did think that we could do a new stub for each RPC, since we attach the localIdentity to each RPC, but chatted with @matthewstevenson88 and he mentioned there is a requirement from S2A that the lifetime of the bidirectional stream from the client to the S2A must be tied to the lifetime of the handshake.
There's plenty of smaller/isolated items. Feel free to group those together as you wish, as they are easy to review. A few of the comments might turn into larger/plumbing changes. Some of those may be best to be separate. But overall, if you do lots of the "easy" stuff together that'd be fine to review/merge while you work on something more difficult. |
@ejona86 , @larry-safran I have addressed all the comments on this PR in the following 3 PRs:
PTAL at these 3 and leave any comments. I will address the comment to move things to internal package and combine the MTLS and non-MTLS apis in the following 2 PRs, once the above 3 are merged.
Besides these changes, please let me know anything else we can do to enable s2a to be included in a grpc-java release. Thank you! |
|
||
Channel ch = channelPool.getChannel(); | ||
S2AServiceGrpc.S2AServiceStub stub = S2AServiceGrpc.newStub(ch); | ||
S2AStub s2aStub = S2AStub.newInstance(stub); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It resolves "never" but that is still busted. The stub needs to be closed when it is no longer needed. That could be when the future completes (e.g., sslContextFuture.addListener(s2aStub::close, service)
). But I don't see a reason to split up the lifetimes between the threads like this. It seems clearer to:
@Override
protected void handlerAdded0(ChannelHandlerContext ctx) {
// Buffer all reads until the TLS Handler is added.
BufferReadsHandler bufferReads = new BufferReadsHandler();
ctx.pipeline().addBefore(ctx.name(), /* name= */ null, bufferReads);
ListenableFuture<SslContext> sslContextFuture = service.submit(this::createSslContext);
...
}
private SslContext createSslContext() {
Channel ch = channelPool.getChannel();
try (S2AStub s2aStub = S2AStub.newInstance(S2AServiceGrpc.newStub(ch))) {
return SslContextFactory.createForClient(s2aStub, hostname, localIdentity);
} finally {
channelPool.returnToPool(ch);
}
}
Add S2A Java client to gRPC Java.
Context: https://github.com/google/s2a-go/blob/main/README.md