Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not retry iceberg operations on unrecoverable exceptions #19307

Conversation

oskar-szwajkowski
Copy link
Contributor

Description

As in title

Additional context and related issues

Release notes

(X ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 7, 2023
@github-actions github-actions bot added the iceberg Iceberg connector label Oct 7, 2023
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a better way to do this than depending on a file system implementation class.

@@ -237,7 +238,8 @@ protected void refreshFromMetadataLocation(String newLocation)
.withMaxRetries(20)
.withBackoff(100, 5000, MILLIS, 4.0)
.withMaxDuration(Duration.ofMinutes(10))
.abortOn(failure -> failure instanceof ValidationException || isNotFoundException(failure))
.abortOn(ValidationException.class, UnrecoverableS3OperationException.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception should be considered private to TrinoS3FileSystem (we should make it private) and won't be accessible after we complete the Hadoop removal project. Also, this is only for the legacy S3 file system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some more top level exception for filesystem errors that should not be retried? seems like common requirement, or maybe there is already exception that UnrecoverableS3OperationException could extend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create a new exception for this, but we should first see if this is applicable to any of the new file systems. Can you investigate that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced io.trino.filesystem.UnrecoverableTrinoFileSystemException, and make this private one extending from it

I can investigate at least some of them, I'll make a list of all implementors of TrinoFileSystem that I went / not went through, but let me create another issue/PR for it once this is merged and io.trino.filesystem.UnrecoverableTrinoFileSystemException is available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure that this will be useful in the future and won't become dead code after we remove the legacy S3 file system.

@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/do-not-rety-iceberg-on-terminal-exceptions branch 2 times, most recently from 6c03ad9 to 189dbfd Compare October 9, 2023 09:45
@@ -237,7 +238,8 @@ protected void refreshFromMetadataLocation(String newLocation)
.withMaxRetries(20)
.withBackoff(100, 5000, MILLIS, 4.0)
.withMaxDuration(Duration.ofMinutes(10))
.abortOn(failure -> failure instanceof ValidationException || isNotFoundException(failure))
.abortOn(ValidationException.class, UnrecoverableS3OperationException.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure that this will be useful in the future and won't become dead code after we remove the legacy S3 file system.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/do-not-rety-iceberg-on-terminal-exceptions branch 2 times, most recently from 668130b to dcbae7c Compare October 10, 2023 08:58
@oskar-szwajkowski
Copy link
Contributor Author

@electrum could you take a look again?

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 10, 2024
@mosabua
Copy link
Member

mosabua commented Jan 11, 2024

👋 @oskar-szwajkowski could you ensure any rebase necessary is done and CI passes. @electrum could you have a look again.

@github-actions github-actions bot removed the stale label Jan 12, 2024
@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/do-not-rety-iceberg-on-terminal-exceptions branch from dcbae7c to 7440f5d Compare January 18, 2024 14:00
@oskar-szwajkowski
Copy link
Contributor Author

👋 @oskar-szwajkowski could you ensure any rebase necessary is done and CI passes.

Rebased original branch, but there were no conflicts.

@mosabua
Copy link
Member

mosabua commented Jan 18, 2024

@findepi and @electrum can you chime in here and figure out whats next?

@findepi
Copy link
Member

findepi commented Jan 31, 2024

@findepi and @electrum can you chime in here and figure out whats next?

since there is a red mark, a slight preference would be for David to follow-up

@electrum
Copy link
Member

My comment about making this applicable to the new file systems has not been addressed. This feature is only useful for the deprecated S3 file system, so I'd rather not add it just for that.

@mosabua
Copy link
Member

mosabua commented Jan 31, 2024

@oskar-szwajkowski could you address the request from @electrum please?

@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/do-not-rety-iceberg-on-terminal-exceptions branch from 7440f5d to 19071b1 Compare February 1, 2024 13:43
@oskar-szwajkowski
Copy link
Contributor Author

@oskar-szwajkowski could you address the request from @electrum please?

@electrum I added handling of retryable / non retryable exceptions in new s3 based file system

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 22, 2024
@mosabua
Copy link
Member

mosabua commented Feb 22, 2024

@electrum I think this is ready for another look from you.

@github-actions github-actions bot removed the stale label Feb 28, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 21, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Apr 11, 2024
@mosabua mosabua reopened this Apr 11, 2024
@mosabua
Copy link
Member

mosabua commented Apr 11, 2024

@oskar-szwajkowski @electrum @findepi @amogh-jahagirdar @bitsondatadev .. can you help out here to get this towards merge?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some minor comments, but from an Iceberg connector perspective this seems good; definitley want to avoid wasting compute from retrying reading a file which will always fail.

@@ -924,17 +925,12 @@ private static boolean isHadoopFolderMarker(S3ObjectSummary object)
return object.getKey().endsWith("_$folder$");
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd still leave the comment, that still applies no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you moved it to UnrecoverableIOException. Seems reasonable but I'd keep some of the examples like Forbidden in the comment in UnrecoverableIOException. That looks to be missing.

@@ -255,6 +256,13 @@ public Optional<Location> createTemporaryDirectory(Location targetPath, String t
return Optional.empty();
}

private IOException asIOException(String message, SdkException exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this method can be static?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd suggest but I also understand if it's hard to do for this case, can we see about adding a test? This case seems tricky since we'd need to spoof bad auth or something in the smoke tests and then validate how many file IO interactions there were for metadata. Forcing a bad auth maybe hard given the current test setup but maybe I'm missing something. If there's a simple way to do that currently, I'd recommend that as well!

@github-actions github-actions bot removed the stale label Apr 12, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
Copy link

github-actions bot commented Jun 3, 2024

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jun 3, 2024
@mosabua mosabua reopened this Jun 3, 2024
@mosabua
Copy link
Member

mosabua commented Jun 3, 2024

Reopening to allow @electrum to review and chime in.

@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jun 3, 2024
Convert terminal s3 filesystem exceptions to extend UnrecoverableIOException

Convert SDKException to UnrecoverableIOException
if they are not supposed to be retryable
@oskar-szwajkowski
Copy link
Contributor Author

This PR is superseded by #22814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

5 participants