-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36346: [C++][S3] Shutdown aws-sdk-cpp related resources on finalize #36437
Conversation
…inalize All S3 related operations are failed after we call arrow::fs::FinalizeS3().
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok on the principle, but curious to know @westonpace 's opinion.
@@ -2562,6 +2626,8 @@ Result<std::shared_ptr<io::OutputStream>> S3FileSystem::OpenOutputStream( | |||
ARROW_ASSIGN_OR_RAISE(auto path, S3Path::FromString(s)); | |||
RETURN_NOT_OK(ValidateFilePath(path)); | |||
|
|||
RETURN_NOT_OK(CheckS3Finalized()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this consistently at the start of the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
(I chose this position to use non aws-sdk-cpp related codes as much as possible even after arrow::fs::FinalizeS3()
. But it may not be important in most cases.)
"before carrying out any S3-related operation"); | ||
} | ||
return Status::OK(); | ||
} | ||
|
||
Status CheckS3Finalized() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually useful? is_initialized_
is set to false when finalizing, so CheckS3Initialized
will already fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, please call this CheckS3NotFinalized
as the current name is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right.
We don't need this and I should have added Not
...
It appears this fix doesn't work properly, as the test I added crashes: |
I'm working on a slightly different approach. |
Alternative PR: #36442 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we need to destruct all S3 clients on arrow::fs::FinalizeS3()
to avoid calling S3 client destructors after exit()
. (Invalidating S3 filesystems isn't enough.)
I think that the #36442 approach (that destructs all S3 clients on arrow::fs::FinalizeS3()
) is better than this approach. I close this.
"before carrying out any S3-related operation"); | ||
} | ||
return Status::OK(); | ||
} | ||
|
||
Status CheckS3Finalized() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right.
We don't need this and I should have added Not
...
@@ -2562,6 +2626,8 @@ Result<std::shared_ptr<io::OutputStream>> S3FileSystem::OpenOutputStream( | |||
ARROW_ASSIGN_OR_RAISE(auto path, S3Path::FromString(s)); | |||
RETURN_NOT_OK(ValidateFilePath(path)); | |||
|
|||
RETURN_NOT_OK(CheckS3Finalized()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
(I chose this position to use non aws-sdk-cpp related codes as much as possible even after arrow::fs::FinalizeS3()
. But it may not be important in most cases.)
Rationale for this change
arrow::FinalizeS3()
doesn't call both ofRegionResolver::ResetDefaultInstance()
andAws::ShutdownAPI()
by #33858.If we don't call both of them, some aws-sdk-cpp related objects are destroyed on exit. It may cause a crash.
What changes are included in this PR?
This calls both of them by
arrow::FinalizeS3()
again to prevent crash on exit. All S3 related operations are failed after we callarrow::fs::FinalizeS3()
.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.