-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry and jmx metrics while generating Thrift's delegation token #21000
Conversation
cc @electrum if you can please help with the review, thanks a lot! |
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
ba0ed22
to
7b7d7b8
Compare
...src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreAuthenticationModule.java
Show resolved
Hide resolved
@electrum will cross check the pt failures. |
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
.withMaxDuration(java.time.Duration.of(30, SECONDS)) | ||
.withMaxAttempts(maxRetries + 1) | ||
.withBackoff(minBackoffDelay.toMillis(), maxBackoffDelay.toMillis(), MILLIS, backoffScaleFactor) | ||
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what errors the metastore throws for getDelegationToken()
. The only declared exception is MetaException
. If that is always a permanent error, then we should abort on that. Otherwise, we should retry everything. So this should be
.abortOn(MetaException.class)
or remove it if MetaException
is thrown for transient errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering, if we can abort on TException
?
as getDT throws MetaException
, TException
,
and MetaException extends TException
that extends Exception
public String getDelegationToken(String token_owner, String renewer_kerberos_principal_name) throws MetaException, TException {
this.sendGetDelegationToken(token_owner, renewer_kerberos_principal_name);
return this.recvGetDelegationToken();
}
so, .abortOn(TException.class)
like it is done above, make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum thanks for the review and help!
Apart from this, I have tried to take care of all the other comments, if you can please check once.
Please see if .abortOn(TException.class)
make sense or not as TException
is also thrown.
Or may be I am missing something?
thanks
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Show resolved
Hide resolved
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@osscm .. can you address the final remarks from @electrum .. I think this is ready to merge afterwards since he approved. Any concerns @findinpath ? |
Ack @Manfred
…On Mon, May 13, 2024 at 9:16 AM Manfred Moser ***@***.***> wrote:
@osscm <https://github.com/osscm> .. can you address the final remarks
from @electrum <https://github.com/electrum> .. I think this is ready to
merge afterwards since he approved. Any concerns @findinpath
<https://github.com/findinpath> ?
—
Reply to this email directly, view it on GitHub
<#21000 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXQ2PYWXQAF3TZKZTRUITK3ZCDRPLAVCNFSM6AAAAABEN6XRJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBYGEZDSNBSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
7024c36
to
d21f431
Compare
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
@electrum can you look .. seems like you approved already. Could you merge? I will see what the CI build does ... |
@@ -58,6 +66,12 @@ public TokenFetchingMetastoreClientFactory( | |||
.maximumSize(thriftConfig.getDelegationTokenCacheMaximumSize()), | |||
CacheLoader.from(this::loadDelegationToken)); | |||
this.refreshPeriod = Duration.ofMinutes(1).toNanos(); | |||
retryPolicy = RetryPolicy.builder() | |||
.withMaxDuration(java.time.Duration.of(30, SECONDS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use maxRetryTime
from ThriftHiveMetastore
.withMaxDuration(java.time.Duration.of(30, SECONDS)) | ||
.withMaxAttempts(thriftConfig.getMaxRetries() + 1) | ||
.withBackoff(thriftConfig.getMinBackoffDelay().toMillis(), thriftConfig.getMaxBackoffDelay().toMillis(), MILLIS, thriftConfig.getBackoffScaleFactor()) | ||
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we figure which exceptions are permanent or temporary?
Is MetaException
permanent?
Can TException
be temporary?
Would getDelegationToken
throw TrinoExceptipon
at all (I don't think it does)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we do not need to use TrinoException
,
we can Abort on TException
based on how it is done for other operation's retries as well,
like createDatabase
at the Metastore side throws three exceptions and Trino side we abort for these three exceptions only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though that maybe we can narrow it more, but it's fine for me
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Show resolved
Hide resolved
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: mmalhotra.
|
9cb45ac
to
e1343f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/thrift/TokenFetchingMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/RetryDriver.java
Show resolved
Hide resolved
.withMaxDuration(java.time.Duration.of(30, SECONDS)) | ||
.withMaxAttempts(thriftConfig.getMaxRetries() + 1) | ||
.withBackoff(thriftConfig.getMinBackoffDelay().toMillis(), thriftConfig.getMaxBackoffDelay().toMillis(), MILLIS, thriftConfig.getBackoffScaleFactor()) | ||
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though that maybe we can narrow it more, but it's fine for me
Please squash commits |
e1343f7
to
0c20882
Compare
@@ -121,4 +144,13 @@ private record DelegationToken(long writeTimeNanos, String delegationToken) | |||
requireNonNull(delegationToken, "delegationToken is null"); | |||
} | |||
} | |||
|
|||
private static RuntimeException propagate(Throwable throwable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this method as Failsafe
does handle InterruptiodException
correctly, see: dev.failsafe.internal.RetryPolicyExecutor#apply
} | ||
catch (TException e) { | ||
throw new TrinoException(HIVE_METASTORE_ERROR, e); | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be:
catch (FailsafeException e) {
if (e.getCause() instanceof TException) {
throw new TrinoException(HIVE_METASTORE_ERROR, e.getCause());
}
throw e;
}
per dev.failsafe.FailsafeExecutor#get(dev.failsafe.function.CheckedSupplier<T>)
javadoc:
* @throws NullPointerException if the {@code supplier} is null
* @throws FailsafeException if the execution fails with a checked Exception. {@link FailsafeException#getCause()} can
* be used to learn the underlying checked exception.
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: mmalhotra.
|
10d490b
to
08d8e64
Compare
08d8e64
to
59c19e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % failure is related % please squash commits
59c19e0
to
3202480
Compare
updated, can we merge it now, please? |
31f0928
to
a3e599c
Compare
a3e599c
to
6a691da
Compare
Description
Additional context and related issues
fixes : #20999
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.