-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not retry iceberg operations on unrecoverable exceptions #19307
Do not retry iceberg operations on unrecoverable exceptions #19307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a better way to do this than depending on a file system implementation class.
@@ -237,7 +238,8 @@ protected void refreshFromMetadataLocation(String newLocation) | |||
.withMaxRetries(20) | |||
.withBackoff(100, 5000, MILLIS, 4.0) | |||
.withMaxDuration(Duration.ofMinutes(10)) | |||
.abortOn(failure -> failure instanceof ValidationException || isNotFoundException(failure)) | |||
.abortOn(ValidationException.class, UnrecoverableS3OperationException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception should be considered private to TrinoS3FileSystem
(we should make it private) and won't be accessible after we complete the Hadoop removal project. Also, this is only for the legacy S3 file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some more top level exception for filesystem errors that should not be retried? seems like common requirement, or maybe there is already exception that UnrecoverableS3OperationException
could extend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create a new exception for this, but we should first see if this is applicable to any of the new file systems. Can you investigate that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have introduced io.trino.filesystem.UnrecoverableTrinoFileSystemException
, and make this private one extending from it
I can investigate at least some of them, I'll make a list of all implementors of TrinoFileSystem
that I went / not went through, but let me create another issue/PR for it once this is merged and io.trino.filesystem.UnrecoverableTrinoFileSystemException
is available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that this will be useful in the future and won't become dead code after we remove the legacy S3 file system.
6c03ad9
to
189dbfd
Compare
...rino-filesystem/src/main/java/io/trino/filesystem/UnrecoverableTrinoFileSystemException.java
Outdated
Show resolved
Hide resolved
...rino-filesystem/src/main/java/io/trino/filesystem/UnrecoverableTrinoFileSystemException.java
Outdated
Show resolved
Hide resolved
...rino-filesystem/src/main/java/io/trino/filesystem/UnrecoverableTrinoFileSystemException.java
Outdated
Show resolved
Hide resolved
@@ -237,7 +238,8 @@ protected void refreshFromMetadataLocation(String newLocation) | |||
.withMaxRetries(20) | |||
.withBackoff(100, 5000, MILLIS, 4.0) | |||
.withMaxDuration(Duration.ofMinutes(10)) | |||
.abortOn(failure -> failure instanceof ValidationException || isNotFoundException(failure)) | |||
.abortOn(ValidationException.class, UnrecoverableS3OperationException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that this will be useful in the future and won't become dead code after we remove the legacy S3 file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
668130b
to
dcbae7c
Compare
@electrum could you take a look again? |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
👋 @oskar-szwajkowski could you ensure any rebase necessary is done and CI passes. @electrum could you have a look again. |
dcbae7c
to
7440f5d
Compare
Rebased original branch, but there were no conflicts. |
My comment about making this applicable to the new file systems has not been addressed. This feature is only useful for the deprecated S3 file system, so I'd rather not add it just for that. |
@oskar-szwajkowski could you address the request from @electrum please? |
7440f5d
to
19071b1
Compare
@electrum I added handling of retryable / non retryable exceptions in new s3 based file system |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@electrum I think this is ready for another look from you. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
@oskar-szwajkowski @electrum @findepi @amogh-jahagirdar @bitsondatadev .. can you help out here to get this towards merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some minor comments, but from an Iceberg connector perspective this seems good; definitley want to avoid wasting compute from retrying reading a file which will always fail.
@@ -924,17 +925,12 @@ private static boolean isHadoopFolderMarker(S3ObjectSummary object) | |||
return object.getKey().endsWith("_$folder$"); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd still leave the comment, that still applies no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see you moved it to UnrecoverableIOException. Seems reasonable but I'd keep some of the examples like Forbidden in the comment in UnrecoverableIOException
. That looks to be missing.
@@ -255,6 +256,13 @@ public Optional<Location> createTemporaryDirectory(Location targetPath, String t | |||
return Optional.empty(); | |||
} | |||
|
|||
private IOException asIOException(String message, SdkException exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this method can be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd suggest but I also understand if it's hard to do for this case, can we see about adding a test? This case seems tricky since we'd need to spoof bad auth or something in the smoke tests and then validate how many file IO interactions there were for metadata. Forcing a bad auth maybe hard given the current test setup but maybe I'm missing something. If there's a simple way to do that currently, I'd recommend that as well!
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
Reopening to allow @electrum to review and chime in. |
Convert terminal s3 filesystem exceptions to extend UnrecoverableIOException Convert SDKException to UnrecoverableIOException if they are not supposed to be retryable
19071b1
to
6b44756
Compare
This PR is superseded by #22814 |
Description
As in title
Additional context and related issues
Release notes
(X ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: