Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: revert #3400: reintroduce experimental S2A integration in client libraries grpc transport #3548

Merged
merged 25 commits into from
Jan 24, 2025

Conversation

rmehta19
Copy link
Contributor

@rmehta19 rmehta19 commented Jan 7, 2025

Revert #3400.

This PR re-introduces the S2A integration the Java Cloud SDK (initially introduced in #3326, and temporarily reverted in #3400).

This PR does this by reverting #3400 with the following patches:

  • load the S2A APIs via reflection. This allows us to merge the code while the S2A API is still experimental in gRPC-Java without introducing a diamond dependency conflict. Once the S2A APIs are stable, the reflection logic can be removed and the S2A API can be used directly (via a dependency on S2A API)
  • fix NPE (s2a fix: fix NPE. #3401)
  • use a different env var name for enabling the feature

Below is the original description from #3326

Modify the Client Libraries gRPC Channel builder to use mTLS via S2A if the experimental environment variable is set, S2A is available (We check this by using SecureSessionAgent utility), and a few more conditions (see shouldUseS2A).

Following https://google.aip.dev/auth/4115, Only attempt to use S2A after DirectPath and DCA (https://google.aip.dev/auth/4114) are ruled out as options. If conditions to use S2A are not met (env variable not set, or S2A is not running in environment, etc (shouldUseS2A returns false)), fall back to default TLS connection.

When we are creating S2A-enabled Grpc Channel Credentials, we first try to secure the connection between the client and the S2A via MTLS, using MTLS-MDS credentials. If MTLS-MDS credentials can't be loaded, then we fallback to a plaintext connection between the client and S2A.

The parallel go implementation : googleapis/google-api-go-client#1874 (now lives here: https://github.com/googleapis/google-cloud-go/blob/main/auth/internal/transport/cba.go)

S2A Java client: https://github.com/grpc/grpc-java/tree/master/s2a

Resolving b/376258193 means that S2A.java is no longer experimental

@@ -1,5 +1,5 @@
/*
* Copyright 2024 Google LLC
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like running mvn fmt:format resulted in all these changes. Perhaps these changes could be made in another PR and then can be removed from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are missed in #3513.
We will do a separate PR for these and you can remove from this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! with the latest merge of main, these changes have been removed from this PR.

@rmehta19
Copy link
Contributor Author

rmehta19 commented Jan 7, 2025

@lqiu96 @blakeli0 @zhumin8 , please review, thanks!

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jan 9, 2025
Comment on lines 318 to 319
String s2AEnv;
s2AEnv = envProvider().getenv(S2A_ENV_ENABLE_USE_S2A);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be one line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -288,6 +306,37 @@ private String determineEndpoint() throws IOException {
return endpoint;
}

/** Determine if S2A can be used */
@VisibleForTesting
boolean shouldUseS2A() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a issue raised re: netty-tcnative dropping support for Windows and MacOS Intel platforms, do we want to add the runtime checks here an skip S2A if the code is running on an unsupported platforms?

We can also do that in a followup PR, but just to mention it here so we don't forget.

Copy link
Contributor Author

@rmehta19 rmehta19 Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this Kui, I think this is fine to get to in a followup PR. Sent you a ping with some details.

(We can probably use System.getProperty("os.name") to get this info. Precedence for doing so is in existing DirectPath logic: https://github.com/googleapis/sdk-platform-java/blob/main/gax-java/gax-grpc/src/main/java/com/google/api/gax/grpc/InstantiatingGrpcChannelProvider.java#L368)

}
if (channelCredentials != null) {
// Create the channel using S2A-secured channel credentials.
builder = Grpc.newChannelBuilder(mtlsEndpoint, channelCredentials);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is what we discussed re: only pick the mtls endpoint when we know s2a will be used and directpath will not be used, but let know if otherwise. thanks

Copy link
Contributor Author

@rmehta19 rmehta19 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is correct. We removed the logic in determineEndpoint in EndpointContext to set the mtls endpoint if shouldUseS2A returns true. We instead plumb down the mtls endpoint so that we use it only when we know DirectPath is not being used.

This is because the decision to use S2A and the decision to use DirectPath happen in different places (EndpointContext vs InstantiatingGrpcChannelProvider).

Comment on lines 93 to 108
/** True if the TransportProvider has no mtlsEndpoint set. */
boolean needsMtlsEndpoint();

/**
* Sets the endpoint to use when constructing a new {@link TransportChannel}.
*
* <p>This method should only be called if {@link #needsEndpoint()} returns true.
*/
TransportChannelProvider withEndpoint(String endpoint);

/**
* Sets the mtlsEndpoint to use when constructing a new {@link TransportChannel}.
*
* <p>This method should only be called if {@link #needsMtlsEndpoint()} returns true.
*/
TransportChannelProvider withMtlsEndpoint(String mtlsEndpoint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this a bit more. I'm not the biggest fan of adding these two public methods in needsMtlsEndpoint + withMtlsEndpoint and would prefer not to if possible.

I know this is a limitation regarding DirectPath and how that's determined. If we can't find a reasonable alternative, then I think I'm fine with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at this and I think there are possible a few alternatives to what is in this PR. Not sure entirely what all the pros and cons are as of now, but just going to pose them as possibilities.

  1. Remove the .mtls substring when using directpath flow. This is probably the simplest option, but is pretty much a workaround. Given that S2A may end up being used much more, I don't think I would like endpoint resolution to be based on string find and replace.
  2. Create a default method canUseDirectPath() in the TransportChannelProvider interface
  default boolean canUseDirectPath() {
    return false;
  }

I believe this allows us to use it via the ClientContext to resolve the endpoint before we create the TransportChannel:
i.e.

    if (transportChannelProvider.needsEndpoint()) {
      if (transportChannelProvider.canUseDirectPath()) {
        transportChannelProvider =
            transportChannelProvider.withEndpoint(endpointContext.mtlsEndpoint());
      } else {
        transportChannelProvider = transportChannelProvider.withEndpoint(endpoint);
      }
    }

It shouldn't affect users who manually create InstantiatingGrpcChannelProvider since the logic still resides in there and is used during channel creation.

Let me talk with the team and see if there are any other potential concerns that I'm missing with this. I think option 2 might be a possibility and if not, then I think we can proceed with this.

Copy link
Contributor Author

@rmehta19 rmehta19 Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this Lawrence.

Remove the .mtls substring when using directpath flow. This is probably the simplest option, but is pretty much a workaround. Given that S2A may end up being used much more, I don't think I would like endpoint resolution to be based on string find and replace.

Agreed that this is probably the simplest option. However I also am hesitant about modifying the endpoint using string find and replace, as you pointed out.

Create a default method canUseDirectPath() in the TransportChannelProvider interface

I think I understand this, however I think it might not be possible without reworking a few things with canUseDirectPath:

Looking at canUseDirectPath, it calls:

The second one is easy to resolve, just be sure to call transportChannelProvider.needsCredentials() before transportChannelProvider.canUseDirectPath() in ClientContext. The first one will probably require moving that check out of canUseDirectPath and into ClientContext. I think this list is complete, but we may find more when we go to implement this.

Also, I think small typo in example you provided, should it be?:

if (transportChannelProvider.needsEndpoint()) {
      if (transportChannelProvider.canUseDirectPath()) {
        transportChannelProvider =
            transportChannelProvider.withEndpoint(endpoint);
      } else {
        transportChannelProvider = transportChannelProvider.withEndpoint(endpointContext.mtlsEndpoint());
      }
    }

Also, perhaps we could change it to set the mtls endpoint only if shouldUseS2A and !canUseDirectPath, and use endpoint derived via endpointResolution in all other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we will not plumb the mtls endpoint for now, and just set the mtls endpoint if S2A can be used in EndpointContext in 25445d3

</difference>
<!-- Ignore this as this was part of s2a-grpc ExperimentalApi revert -->
<!-- Ignore method addition to an TransportChannelProvider interface -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- Ignore method addition to an TransportChannelProvider interface -->
<!-- Ignore method addition to TransportChannelProvider interface (InternalExtensionOnly) -->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e921696

@@ -106,16 +106,26 @@
<className>com/google/api/gax/batching/Batcher</className>
<method>*</method>
</difference>
<!-- Ignore this as this was part of s2a-grpc ExperimentalApi revert -->
<!-- Ignore abstract method addition to an EndpointContext -->
Copy link
Contributor

@lqiu96 lqiu96 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Adding the why (i.e. it's marked as internal)

Suggested change
<!-- Ignore abstract method addition to an EndpointContext -->
<!-- Ignore abstract method addition to an EndpointContext (InternalApi) -->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e921696

@lqiu96
Copy link
Contributor

lqiu96 commented Jan 16, 2025

/gcbrun

@rmehta19
Copy link
Contributor Author

@blakeli0 , @zhumin8 would you be able to review? Thanks!

@BetaApi(
"The S2A feature is not stable yet and may change in the future. https://github.com/grpc/grpc-java/issues/11533.")
default TransportChannelProvider withUseS2A(boolean useS2A) {
throw new UnsupportedOperationException("S2A is not supported");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we decided to throw exception last time we reviewed it, but we discovered that this could break downstream libraries' or customers' tests if they have a mock TransportChannelProvider. To prevent such breaking changes, can we change this to return this; instead? Then we don't have to override this method in other implementations (LocalChannelProvider, InstantiatingHttpJsonChannelProvider etc.) of this interface as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up! Changed the default behavior to return this and removed overriding of the method in all other implementations except gRPC.

1703f51

@blakeli0
Copy link
Collaborator

/gcbrun

@rmehta19
Copy link
Contributor Author

@blakeli0 I see there are 2 failing checks. Looking at the logs, the failures seem unrelated to the changes in this PR, WDYT?

@zhumin8
Copy link
Contributor

zhumin8 commented Jan 22, 2025

sonar check is not supported for forked branches. And you can ignore "Build with Airlock" check for now.

@rmehta19 rmehta19 changed the title Revert #3400: Reintroduce experimental S2A integration in client libraries grpc transport feat: revert #3400: reintroduce experimental S2A integration in client libraries grpc transport Jan 22, 2025
@rmehta19
Copy link
Contributor Author

@zhumin8 does this PR look ok to you? If so, are you able to merge (as I don't have write access)? Thank you!

@blakeli0
Copy link
Collaborator

@rmehta19 Please resolve the conflict in InstantiatingGrpcChannelProvider, which is probably caused by #3467. We'll merge it once it is resolved.

@rmehta19
Copy link
Contributor Author

@rmehta19 Please resolve the conflict in InstantiatingGrpcChannelProvider, which is probably caused by #3467. We'll merge it once it is resolved.

Thanks @blakeli0, the conflict is resolved.

@blakeli0
Copy link
Collaborator

/gcbrun

@rmehta19
Copy link
Contributor Author

rmehta19 commented Jan 22, 2025

I was looking through resolved comments and I found this one, I think mine and @lqiu96 's comments might have crossed at the same time..

I think the resolution is that we should be putting the

if (shouldUseS2A()) {
        return mtlsEndpoint();
      }

at the end of determineEndpoint() as "universe domain support with mtls may require this additional logic". Right now it's at the beginning. WDYT @lqiu96, @blakeli0?

I tested in an internal patch, having this at the beginning or end of determineEndpoint() works for our GCP use cases, and while S2A is experimental, shouldUseS2A() will anyways return false, so it won't impact customers (on or off GCP).

@lqiu96
Copy link
Contributor

lqiu96 commented Jan 23, 2025

I was looking through resolved comments and I found this one, I think mine and @lqiu96 's comments might have crossed at the same time..

I think the resolution is that we should be putting the

if (shouldUseS2A()) {
        return mtlsEndpoint();
      }

at the end of determineEndpoint() as "universe domain support with mtls may require this additional logic". Right now it's at the beginning. WDYT @lqiu96, @blakeli0?

I tested in an internal patch, having this at the beginning or end of determineEndpoint() works for our GCP use cases, and while S2A is experimental, shouldUseS2A() will anyways return false, so it won't impact customers (on or off GCP).

Took another look at this and I'm curious about the behavior regarding S2A and having the resolved endpoint be the non-mtls endpoint. mtlsResolver simply determines to use the mtls endpoint or the non-mtls endpoint. Based on the logic, I think it's possible that the resolved endpoint ends up being non-mtls endpoint, but S2A is true.

Should we guard against this or return an error? I think in this case, we may need to put this towards the end.

@rmehta19
Copy link
Contributor Author

rmehta19 commented Jan 23, 2025

I was looking through resolved comments and I found this one, I think mine and @lqiu96 's comments might have crossed at the same time..
I think the resolution is that we should be putting the

if (shouldUseS2A()) {
        return mtlsEndpoint();
      }

at the end of determineEndpoint() as "universe domain support with mtls may require this additional logic". Right now it's at the beginning. WDYT @lqiu96, @blakeli0?
I tested in an internal patch, having this at the beginning or end of determineEndpoint() works for our GCP use cases, and while S2A is experimental, shouldUseS2A() will anyways return false, so it won't impact customers (on or off GCP).

Took another look at this and I'm curious about the behavior regarding S2A and having the resolved endpoint be the non-mtls endpoint. mtlsResolver simply determines to use the mtls endpoint or the non-mtls endpoint. Based on the logic, I think it's possible that the resolved endpoint ends up being non-mtls endpoint, but S2A is true.

Should we guard against this or return an error? I think in this case, we may need to put this towards the end.

Thanks for looking @lqiu96. It would be an error to use S2A with a non-MTLS endpoint, because when we start using the MTLS_S2A hard bound tokens on the connection, the requests would get rejected. To be clear: if S2A is used, the resolved endpoint needs to be an MTLS endpoint.

Based on the logic, I think it's possible that the resolved endpoint ends up being non-mtls endpoint, but S2A is true.

I'm not sure I am following this. With this check in the beginning or end of determineEndpoint, if S2A(shouldUseS2A) is true, the resolved endpoint will be a MTLS endpoint. There is an early return however for GDCH, so I do think it makes sense to move the check to the end of determineEndpoint, because if GDCH is being used, a custom endpoint gets returned.

In 6f56cff I moved the check to the end of determineEndpoint.

@lqiu96
Copy link
Contributor

lqiu96 commented Jan 23, 2025

if S2A(shouldUseS2A) is true, the resolved endpoint will be a MTLS endpoint.

Oh sorry, what I wrote was confusing. What I meant is that it is possible that the mtlsResolver may return a non-mtls endpoint while shouldUseS2A is true (enabled). It would be confusing since mtlsResolver ends up checking the things that are required to enable MTLS (i.e. the required env vars and keystore, etc). So user may have set up for S2A, but the environment is accidentally configured to non-mtls. In this case, it would be that the environment can use non-mtls, but the user is set up for S2A (trying to mTLS), which seems to be a whole set of configuration issues/ mismatch. In this case, I think we would want to error out?

Hopefully that is clearer about what I was trying to point out.

@rmehta19
Copy link
Contributor Author

if S2A(shouldUseS2A) is true, the resolved endpoint will be a MTLS endpoint.

Oh sorry, what I wrote was confusing. What I meant is that it is possible that the mtlsResolver may return a non-mtls endpoint while shouldUseS2A is true (enabled). It would be confusing since mtlsResolver ends up checking the things that are required to enable MTLS (i.e. the required env vars and keystore, etc). So user may have set up for S2A, but the environment is accidentally configured to non-mtls. In this case, it would be that the environment can use non-mtls, but the user is set up for S2A (trying to mTLS), which seems to be a whole set of configuration issues/ mismatch. In this case, I think we would want to error out?

Hopefully that is clearer about what I was trying to point out.

Thanks for elaborating @lqiu96! I responded with detail on chat internally.

@lqiu96
Copy link
Contributor

lqiu96 commented Jan 24, 2025

Thanks for the explanation. The changes LGTM! Will merge after the Graalvm checks. Sonar and airlock CI issues can be ignored.

@lqiu96
Copy link
Contributor

lqiu96 commented Jan 24, 2025

/gcbrun

@lqiu96 lqiu96 merged commit 65a0f11 into googleapis:main Jan 24, 2025
45 of 47 checks passed
diegomarquezp pushed a commit that referenced this pull request Jan 25, 2025
🤖 I have created a release *beep* *boop*
---


<details><summary>2.52.0</summary>

##
[2.52.0](v2.51.1...v2.52.0)
(2025-01-24)


### Features

* add support for new setAllowHardBoundTokens field.
([#3467](#3467))
([38431a2](38431a2))
* revert
[#3400](#3400):
reintroduce experimental S2A integration in client libraries grpc
transport
([#3548](#3548))
([65a0f11](65a0f11))


### Dependencies

* update dependency com.google.api-client:google-api-client-bom to
v2.7.2
([#3578](#3578))
([f6e5ad9](f6e5ad9))
* update dependency commons-codec:commons-codec to v1.17.2
([#3557](#3557))
([07ce801](07ce801))
* update dependency gitpython to v3.1.44
([#3559](#3559))
([e924db0](e924db0))
* update dependency org.checkerframework:checker-qual to v3.48.4
([#3560](#3560))
([a4726e9](a4726e9))
* update dependency smmap to v5.0.2
([#3561](#3561))
([6cd5d0d](6cd5d0d))
* update docker.io/library/alpine docker tag to v3.21.1
([#3551](#3551))
([edd5a4c](edd5a4c))
* update docker.io/library/alpine docker tag to v3.21.2
([#3580](#3580))
([f577ecd](f577ecd))
* update docker.io/library/maven:3.9.9-eclipse-temurin-11-alpine docker
digest to 9a259c6
([#3554](#3554))
([eb2cbd6](eb2cbd6))
* update docker.io/library/python:3.13.1-alpine3.20 docker digest to
9ab3b6e
([#3555](#3555))
([40a74fe](40a74fe))
* update google auth library dependencies to v1.31.0
([#3577](#3577))
([7fa879a](7fa879a))
* update googleapis/java-cloud-bom digest to c7c443f
([#3579](#3579))
([fcf40b7](fcf40b7))
* update repo-automation-bots digest to 0a12b5d
([#3464](#3464))
([b9c9d21](b9c9d21))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants