Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36346: [C++][S3] Shutdown aws-sdk-cpp related resources on finalize #36437

Closed
wants to merge 2 commits into from

Conversation

kou
Copy link
Member

@kou kou commented Jul 3, 2023

Rationale for this change

arrow::FinalizeS3() doesn't call both of RegionResolver::ResetDefaultInstance() and Aws::ShutdownAPI() by #33858.

If we don't call both of them, some aws-sdk-cpp related objects are destroyed on exit. It may cause a crash.

What changes are included in this PR?

This calls both of them by arrow::FinalizeS3() again to prevent crash on exit. All S3 related operations are failed after we call arrow::fs::FinalizeS3().

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

…inalize

All S3 related operations are failed after we call
arrow::fs::FinalizeS3().
@kou kou requested review from westonpace and pitrou July 3, 2023 05:15
@github-actions
Copy link

github-actions bot commented Jul 3, 2023

⚠️ GitHub issue #36346 has been automatically assigned in GitHub to PR creator.

@kou kou mentioned this pull request Jul 3, 2023
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok on the principle, but curious to know @westonpace 's opinion.

@@ -2562,6 +2626,8 @@ Result<std::shared_ptr<io::OutputStream>> S3FileSystem::OpenOutputStream(
ARROW_ASSIGN_OR_RAISE(auto path, S3Path::FromString(s));
RETURN_NOT_OK(ValidateFilePath(path));

RETURN_NOT_OK(CheckS3Finalized());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this consistently at the start of the method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
(I chose this position to use non aws-sdk-cpp related codes as much as possible even after arrow::fs::FinalizeS3(). But it may not be important in most cases.)

"before carrying out any S3-related operation");
}
return Status::OK();
}

Status CheckS3Finalized() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually useful? is_initialized_ is set to false when finalizing, so CheckS3Initialized will already fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, please call this CheckS3NotFinalized as the current name is misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right.
We don't need this and I should have added Not...

@pitrou pitrou requested a review from AlenkaF as a code owner July 3, 2023 14:13
@pitrou
Copy link
Member

pitrou commented Jul 3, 2023

It appears this fix doesn't work properly, as the test I added crashes:
https://gist.github.com/pitrou/bde45f7a213bf4016cff454ad69a3653

@pitrou
Copy link
Member

pitrou commented Jul 3, 2023

I'm working on a slightly different approach.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2023

Alternative PR: #36442

Copy link
Member Author

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we need to destruct all S3 clients on arrow::fs::FinalizeS3() to avoid calling S3 client destructors after exit(). (Invalidating S3 filesystems isn't enough.)

I think that the #36442 approach (that destructs all S3 clients on arrow::fs::FinalizeS3()) is better than this approach. I close this.

"before carrying out any S3-related operation");
}
return Status::OK();
}

Status CheckS3Finalized() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right.
We don't need this and I should have added Not...

@@ -2562,6 +2626,8 @@ Result<std::shared_ptr<io::OutputStream>> S3FileSystem::OpenOutputStream(
ARROW_ASSIGN_OR_RAISE(auto path, S3Path::FromString(s));
RETURN_NOT_OK(ValidateFilePath(path));

RETURN_NOT_OK(CheckS3Finalized());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
(I chose this position to use non aws-sdk-cpp related codes as much as possible even after arrow::fs::FinalizeS3(). But it may not be important in most cases.)

@kou kou closed this Jul 3, 2023
@kou kou deleted the cpp-filesystem-s3-finalize branch July 3, 2023 21:23
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][S3] Crash on exit
2 participants