Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-authentication fails with OAUTHBEARER when awsRoleArn is specified #143

Closed
jamielwhite opened this issue Nov 28, 2023 · 15 comments
Closed
Labels
bug Something isn't working

Comments

@jamielwhite
Copy link

I have a Kafka consumer which is failing to re-authenticate. The consumer works for the first 15 minutes, but it fails once the credentials expire despite the logs indicating it has refreshed the credentials. I've replicated this issue with a Java application as well as the kafka-console-consumer.

Here are the logs indicating the login is refreshed at the 10 minute mark, but the consumer fails to re-authenticate at 15 minutes:

2023-11-28T16:26:46.206730136Z [main] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - Successfully logged in.
2023-11-28T16:26:46.210633781Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=:kafka-cluster]: Expiring credential re-login thread started.
2023-11-28T16:26:46.227207550Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=kafka-cluster]: Expiring credential valid from Tue Nov 28 16:26:46 GMT 2023 to Tue Nov 28 16:41:45 GMT 2023
2023-11-28T16:26:46.227312837Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=kafka-cluster]: Proposed refresh time of Tue Nov 28 16:39:17 GMT 2023 extends into the desired buffer time of 300 seconds before expiration, so refresh it at the desired buffer begin point, at Tue Nov 28 16:36:45 GMT 2023
2023-11-28T16:26:46.227352451Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=:kafka-cluster]: Expiring credential re-login sleeping until: Tue Nov 28 16:36:45 GMT 2023
2023-11-28T16:26:46.309291641Z [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 3.6.0
...
2023-11-28T16:36:45.017686979Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - Initiating re-login for kafka-cluster, logout() still needs to be called on a previous login = true
2023-11-28T16:36:45.020997012Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=kafka-cluster]: Expiring credential valid from Tue Nov 28 16:36:45 GMT 2023 to Tue Nov 28 16:51:45 GMT 2023
2023-11-28T16:36:45.021021626Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=kafka-cluster]: Proposed refresh time of Tue Nov 28 16:49:18 GMT 2023 extends into the desired buffer time of 300 seconds before expiration, so refresh it at the desired buffer begin point, at Tue Nov 28 16:46:45 GMT 2023
2023-11-28T16:36:45.021094277Z [kafka-expiring-relogin-thread-kafka-cluster] INFO org.apache.kafka.common.security.oauthbearer.internals.expiring.ExpiringCredentialRefreshingLogin - [Principal=:kafka-cluster]: Expiring credential re-login sleeping until: Tue Nov 28 16:46:45 GMT 2023
2023-11-28T16:41:45.975747819Z [Thread-1] INFO org.apache.kafka.common.network.Selector - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Failed re-authentication with <cluster endpoint> (channelId=2147483645) (Unexpected handshake request with client mechanism OAUTHBEARER, enabled mechanisms are [])
2023-11-28T16:41:45.976673530Z [Thread-1] ERROR org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Connection to node 2147483645 (<cluster endpoint>) failed authentication due to: Unexpected handshake request with client mechanism OAUTHBEARER, enabled mechanisms are []
2023-11-28T16:41:45.977670864Z org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected handshake request with client mechanism OAUTHBEARER, enabled mechanisms are []
2023-11-28T16:41:46.776948991Z [Thread-1] ERROR org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Connection to node 1 (<cluster endpoint>) failed authentication due to: [7f0b6b38-62f2-41a7-87df-f542194eb8b7]: Session too short

This issue does not occur if I remove awsRoleArn from sasl.jaas.config, but the re-authentication fails if I include it. Here's what the properties file looks like:

security.protocol=SASL_SSL
sasl.mechanism=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required awsRoleArn="<role arn>" awsStsRegion="<region>";
sasl.login.callback.handler.class=software.amazon.msk.auth.iam.IAMOAuthBearerLoginCallbackHandler

Kafka version: 3.6.0
aws-msk-iam-auth version: 2.0.0

@jamielwhite
Copy link
Author

I enabled DEBUG logging, and I see the application re-authenticates successfully but with a decreasing session timeout each time, until it's too short:

2023-11-28T23:33:16.653079751Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 898349 ms and session re-authentication on or after 836777 ms
2023-11-28T23:33:16.733016243Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 898268 ms and session re-authentication on or after 796444 ms
2023-11-28T23:33:19.903645191Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 895098 ms and session re-authentication on or after 763044 ms
2023-11-28T23:46:03.117576552Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 131883 ms and session re-authentication on or after 113503 ms
2023-11-28T23:46:33.305042537Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 101696 ms and session re-authentication on or after 89661 ms
2023-11-28T23:47:56.915128578Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 18086 ms and session re-authentication on or after 16443 ms
2023-11-28T23:48:03.451621377Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 11549 ms and session re-authentication on or after 9974 ms
2023-11-28T23:48:13.552575825Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 1449 ms and session re-authentication on or after 1329 ms
2023-11-28T23:48:13.591081907Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 1410 ms and session re-authentication on or after 1302 ms
...
2023-11-28T23:48:15.382982876Z [Thread-1] ERROR org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Connection to node 2147483645 (<cluster endpoint>) failed authentication due to: [18de9fc3-f444-4302-8487-1c38b629702c]: Session too short

@sankalpbhatia
Copy link
Contributor

Thank you for raising this. Great details!

We will deep dive on this asap and get back.

@jamielwhite
Copy link
Author

I updated our application to leave off awsRoleArn, and it shows a much longer expiration time (1 hour vs 15 minutes). So it's possible I didn't wait long enough to see if the issue occurs when awsRoleArn is not included in sasl.jaas.config. I'll keep it running with this configuration to see if there are any reductions in the session expiration.

2023-11-29T00:53:09.024953992Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3598977 ms and session re-authentication on or after 3228285 ms
2023-11-29T00:53:09.113288090Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3598888 ms and session re-authentication on or after 3194801 ms
2023-11-29T00:53:12.238079044Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3595763 ms and session re-authentication on or after 3284138 ms

@sankalpbhatia
Copy link
Contributor

sankalpbhatia commented Nov 29, 2023

I was able to repro the issue locally, and I have a suspicion it might only be isolated to cases where you provide an awsRoleArn to the jaas config.

My hypothesis is that since STSAssumeRoleSessionCredentialsProvider used in https://github.com/aws/aws-msk-iam-auth/blob/main/src/main/java/software/amazon/msk/auth/iam/internals/MSKCredentialProvider.java holds onto aws credentials until sometime before expiry, it returns the same credentials when the ExpiringCredentialRefreshingLogin class looks for new credentials after 10 mins.

@jamielwhite were you able to get results for your last run (without passing awsRoleArn) ? If not, I will try to run a similar consumer app and observe results tomorrow.

@sankalpbhatia sankalpbhatia added the bug Something isn't working label Nov 29, 2023
@jamielwhite
Copy link
Author

Yes, the same issue eventually happened without passing awsRoleArn.

@sankalpbhatia
Copy link
Contributor

@jamielwhite we are trying a couple of things for this. Will share an update soon.

@sankalpbhatia
Copy link
Contributor

@jamielwhite can you share the setup which led to this issue without using awsRoleArn? Would be helpful to know the client properties you used in that case, and what credentials you ended up using at that point? Also, can you also share the session expiration in those cases?

@jamielwhite
Copy link
Author

The setup was the same with awsRoleArn excluded, and the credentials we used for that case had full kafka-cluster:* permissions. The session expiration was 1 hour (we're running Kubernetes pods and getting our credentials through a service account).

security.protocol=SASL_SSL
sasl.mechanism=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;
sasl.login.callback.handler.class=software.amazon.msk.auth.iam.IAMOAuthBearerLoginCallbackHandler

@sankalpbhatia
Copy link
Contributor

Hi @jamielwhite, I just pushed a fix for this issue. Would you be able to build a jar locally and verify if this works for you? We will have a release for this in the next few days.

@jamielwhite
Copy link
Author

Thanks @sankalpbhatia! We've made it a few minutes past the 15 minute mark where it initially failed, so it looks like this is working. I also have a consumer running without awsRoleArn, so I'll let you know if we run into any issues there after an hour.

@jamielwhite
Copy link
Author

This is unrelated to the issue, but does AWS have any plans to release a SASL signer library in Ruby now that it's supported in other languages like Python and Go?

@sankalpbhatia
Copy link
Contributor

There are no plans to release a signer library in Ruby right now.

@sankalpbhatia
Copy link
Contributor

Fixed in https://github.com/aws/aws-msk-iam-auth/releases/tag/v2.0.2

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@jjatinggoyal
Copy link

@sankalpbhatia someone wrote the ruby signer library: https://github.com/bruce-szalwinski-he/aws-msk-iam-sasl-signer-ruby. Please consider moving this to aws org.

More details are present in aws/aws-sdk-ruby#2985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants