Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry and jmx metrics while generating Thrift's delegation token #21000

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

osscm
Copy link
Contributor

@osscm osscm commented Mar 9, 2024

Description

Additional context and related issues

fixes : #20999

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2024
@github-actions github-actions bot added tests:hive hive Hive connector labels Mar 9, 2024
@osscm osscm changed the title Add retry and jmx means for the Thrift's delegation token generation Add retry and jmx metrics while generating Thrift's delegation token Mar 9, 2024
@osscm
Copy link
Contributor Author

osscm commented Mar 9, 2024

cc @electrum if you can please help with the review, thanks a lot!

@osscm osscm force-pushed the thrift-dt-retry branch 3 times, most recently from ba0ed22 to 7b7d7b8 Compare April 8, 2024 21:43
@osscm
Copy link
Contributor Author

osscm commented Apr 9, 2024

@electrum will cross check the pt failures.

@osscm osscm force-pushed the thrift-dt-retry branch from 59beb06 to 52c6491 Compare April 10, 2024 09:01
.withMaxDuration(java.time.Duration.of(30, SECONDS))
.withMaxAttempts(maxRetries + 1)
.withBackoff(minBackoffDelay.toMillis(), maxBackoffDelay.toMillis(), MILLIS, backoffScaleFactor)
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what errors the metastore throws for getDelegationToken(). The only declared exception is MetaException. If that is always a permanent error, then we should abort on that. Otherwise, we should retry everything. So this should be

.abortOn(MetaException.class)

or remove it if MetaException is thrown for transient errors

Copy link
Contributor Author

@osscm osscm Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering, if we can abort on TException?

as getDT throws MetaException, TException,
and MetaException extends TException that extends Exception

public String getDelegationToken(String token_owner, String renewer_kerberos_principal_name) throws MetaException, TException {
            this.sendGetDelegationToken(token_owner, renewer_kerberos_principal_name);
            return this.recvGetDelegationToken();
        }

so, .abortOn(TException.class)

like it is done above, make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum thanks for the review and help!

Apart from this, I have tried to take care of all the other comments, if you can please check once.

Please see if .abortOn(TException.class) make sense or not as TException is also thrown.
Or may be I am missing something?

thanks

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
@mosabua
Copy link
Member

mosabua commented May 13, 2024

@osscm .. can you address the final remarks from @electrum .. I think this is ready to merge afterwards since he approved. Any concerns @findinpath ?

@github-actions github-actions bot removed the stale label May 13, 2024
@osscm
Copy link
Contributor Author

osscm commented May 15, 2024 via email

@osscm osscm force-pushed the thrift-dt-retry branch 3 times, most recently from 7024c36 to d21f431 Compare May 28, 2024 06:52
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 18, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jul 10, 2024
@mosabua mosabua reopened this Jul 10, 2024
@mosabua
Copy link
Member

mosabua commented Jul 10, 2024

@electrum can you look .. seems like you approved already. Could you merge? I will see what the CI build does ...

@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jul 10, 2024
@@ -58,6 +66,12 @@ public TokenFetchingMetastoreClientFactory(
.maximumSize(thriftConfig.getDelegationTokenCacheMaximumSize()),
CacheLoader.from(this::loadDelegationToken));
this.refreshPeriod = Duration.ofMinutes(1).toNanos();
retryPolicy = RetryPolicy.builder()
.withMaxDuration(java.time.Duration.of(30, SECONDS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use maxRetryTime from ThriftHiveMetastore

.withMaxDuration(java.time.Duration.of(30, SECONDS))
.withMaxAttempts(thriftConfig.getMaxRetries() + 1)
.withBackoff(thriftConfig.getMinBackoffDelay().toMillis(), thriftConfig.getMaxBackoffDelay().toMillis(), MILLIS, thriftConfig.getBackoffScaleFactor())
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we figure which exceptions are permanent or temporary?
Is MetaException permanent?
Can TException be temporary?

Would getDelegationToken throw TrinoExceptipon at all (I don't think it does)?

Copy link
Contributor Author

@osscm osscm Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we do not need to use TrinoException,

we can Abort on TException based on how it is done for other operation's retries as well,
like createDatabase at the Metastore side throws three exceptions and Trino side we abort for these three exceptions only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though that maybe we can narrow it more, but it's fine for me

Copy link

cla-bot bot commented Sep 1, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: mmalhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla-signed label Sep 1, 2024
@cla-bot cla-bot bot added the cla-signed label Sep 2, 2024
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

.withMaxDuration(java.time.Duration.of(30, SECONDS))
.withMaxAttempts(thriftConfig.getMaxRetries() + 1)
.withBackoff(thriftConfig.getMinBackoffDelay().toMillis(), thriftConfig.getMaxBackoffDelay().toMillis(), MILLIS, thriftConfig.getBackoffScaleFactor())
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though that maybe we can narrow it more, but it's fine for me

@sopel39
Copy link
Member

sopel39 commented Sep 2, 2024

Please squash commits

@@ -121,4 +144,13 @@ private record DelegationToken(long writeTimeNanos, String delegationToken)
requireNonNull(delegationToken, "delegationToken is null");
}
}

private static RuntimeException propagate(Throwable throwable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this method as Failsafe does handle InterruptiodException correctly, see: dev.failsafe.internal.RetryPolicyExecutor#apply

}
catch (TException e) {
throw new TrinoException(HIVE_METASTORE_ERROR, e);
catch (Exception e) {
Copy link
Member

@sopel39 sopel39 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be:

catch (FailsafeException e) {
            if (e.getCause() instanceof TException) {
                throw new TrinoException(HIVE_METASTORE_ERROR, e.getCause());
            }
            throw e;
}

per dev.failsafe.FailsafeExecutor#get(dev.failsafe.function.CheckedSupplier<T>) javadoc:

   * @throws NullPointerException if the {@code supplier} is null
   * @throws FailsafeException if the execution fails with a checked Exception. {@link FailsafeException#getCause()} can
   * be used to learn the underlying checked exception.

Copy link

cla-bot bot commented Sep 24, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: mmalhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % failure is related % please squash commits

@osscm
Copy link
Contributor Author

osscm commented Oct 17, 2024

lgtm % failure is related % please squash commits

updated, can we merge it now, please?

@sopel39 sopel39 force-pushed the thrift-dt-retry branch 2 times, most recently from 31f0928 to a3e599c Compare October 18, 2024 17:41
@sopel39 sopel39 added the no-release-notes This pull request does not require release notes entry label Oct 18, 2024
@sopel39 sopel39 merged commit f78c99f into trinodb:master Oct 18, 2024
56 checks passed
@github-actions github-actions bot added this to the 463 milestone Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector no-release-notes This pull request does not require release notes entry stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

Add retry and jmx metrics while generating Thrift's delegation token
6 participants