Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15054: [C++] Change s3 finalization to happen after arrow threads finished, add pyarrow exit hook #33858

Merged
merged 9 commits into from
Apr 15, 2023

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented Jan 25, 2023

CRITICAL FIX: When statically linking error with AWS it was possible to have a crash on shutdown/exit. Now that should no longer be possible.

BREAKING CHANGE: S3 can only be initialized and finalized once.

BREAKING CHANGE: S3 (the AWS SDK) will not be finalized until after all CPU & I/O threads are finished.

@westonpace westonpace requested a review from AlenkaF as a code owner January 25, 2023 01:04
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Comment on lines +2582 to +2620
switch (options.log_level) {
LOG_LEVEL_CASE(Fatal)
LOG_LEVEL_CASE(Error)
LOG_LEVEL_CASE(Warn)
LOG_LEVEL_CASE(Info)
LOG_LEVEL_CASE(Debug)
LOG_LEVEL_CASE(Trace)
default:
aws_log_level = Aws::Utils::Logging::LogLevel::Off;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the new tab an expected change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The initialize and finalize methods were file-scoped methods previously. Now they have become the constructor/destructor of the AwsInstance class. Since they are in a class they get some indentation.

@westonpace westonpace force-pushed the bugfix/GH-15054--s3-crash-on-exit branch from ac63d43 to c6ab7cb Compare February 4, 2023 13:24
@amol- amol- changed the title [C++] GH-15054: Change s3 finalization to happen after arrow threads finished, add pyarrow exit hook GH-15054: [C++] Change s3 finalization to happen after arrow threads finished, add pyarrow exit hook Mar 27, 2023
@github-actions
Copy link

@jorisvandenbossche
Copy link
Member

@westonpace do we want to try to get this into 12.0.0? (don't know the state of the PR though)

@westonpace
Copy link
Member Author

@jorisvandenbossche Yes, I think we do. I'll rebase this right now but I think the main concern here was just getting a review / second opinion. I'm going to label it a blocker.

@westonpace westonpace force-pushed the bugfix/GH-15054--s3-crash-on-exit branch from c6ab7cb to 95b1458 Compare April 11, 2023 16:04
@github-actions github-actions bot added the awaiting review Awaiting review label Apr 11, 2023
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I'm the best to review here, but at least at the moment I don't see anything glaringly problematic with this and it seems like a good idea in general.

It feels like it should be better tested, but I'm not particularly sure how would be a good way to do so. But that might just be because it's the end of the day lol.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Apr 12, 2023
}

bool IsAwsInitialized() { return !!GetAwsInstance({}); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this gets called before InitializeS3, the options passed there will be ignored and an instance based on {} options will stick around. I think bringing back the bool flag can avoid these risks.

@westonpace
Copy link
Member Author

Thanks @felipecrv , that is a good point regarding IsInitialized. I added the flag back but put it inside the instance object. I put everything inside the instance object and I think the code is simpler now. What do you think of this version?

Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Thank you!

@westonpace
Copy link
Member Author

These R crashes are a little concerning since they seem to be crashes at shutdown. Maybe it is related.

@westonpace
Copy link
Member Author

Yes, the R crashes are relevant. With this fix we ensure that AWS::ShutdownAPI is the last call we make to AWS. However, AWS has static global objects of its own. So it is entirely possible that these have been destroyed by the time AWS::ShutdownAPI is called. In fact, by prolonging the call to ShutdownAPI here it seems we are making the R situation worse because it is more likely these other static globals have been destroyed. There is an open AWS issue for this at aws/aws-sdk-cpp#1550 . However, it does not sound like it will be resolved quickly. In the meantime an R atexit hook of some kind will be required (technically it has always been required but we've gotten lucky).

Comment on lines 156 to 163
# Registers a callback to run at session exit
.onLoad <- function(libname, pkgname) {
print(parent.env(environment()))
reg.finalizer(parent.env(environment()), finalize_s3, onexit=TRUE)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you asked about this earlier and found .onUnload()...just putting a note here in case this gets forgotten about.

If .onUnload() isn't working you can create your own environment an register a finalizer on that:

s3_finalizer <- new.env(parent = emptyenv())

.onLoad <- function(...) {
  # ...
  reg.finalizer(s3_finalizer, finalize_s3, onexit=TRUE)
}

You probably shouldn't register a finalizer on the package namespace because somebody else might have done that (maybe even R).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also .onLoad() defined twice here as written (you should be able to use the existing .onLoad() to register this hook if you can't in fact use .onUnload().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I will change to this pattern instead of parent.env(environment())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I just realized I left the print in >_<

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if .onUnload() really isn't working make sure to put a comment in explaining that (so I don't forget about this and try to consolidate our namespace unload incorrectly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I've resolved all your concerns now. Thanks for looking at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if .onUnload() really isn't working make sure to put a comment in explaining that (so I don't forget about this and try to consolidate our namespace unload incorrectly)

Added a comment

@westonpace
Copy link
Member Author

The broken C++ tests are unrelated (will be fixed by #35145). It seems the R tests are passing now. Would appreciate a review from @paleolimbot or @thisisnic on the R changes.

@kou do you know if cglib / ruby is using S3? If so, is it calling FinalizeS3 in some kind of shutdown hook? Or do we need a change there too.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting merge Awaiting merge awaiting changes Awaiting changes labels Apr 14, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 14, 2023
@kou
Copy link
Member

kou commented Apr 14, 2023

do you know if cglib / ruby is using S3? If so, is it calling FinalizeS3 in some kind of shutdown hook? Or do we need a change there too.

Yes, I know. GLib/Ruby is using S3.
It doesn't call FinalizeS3() automatically but it provides APIs that can call InitializeS3() and FinalizeS3() explicitly. So we don't need changes for GLib/Ruby here for now.
If a report from community is raised, we can consider this.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 14, 2023
@westonpace
Copy link
Member Author

Thanks, that sounds reasonable. I'll merge when CI passes then.

@westonpace westonpace merged commit 1de159d into apache:main Apr 15, 2023
@westonpace westonpace added this to the 12.0.0 milestone Apr 15, 2023
raulcd pushed a commit that referenced this pull request Apr 17, 2023
…finished, add pyarrow exit hook (#33858)

CRITICAL FIX: When statically linking error with AWS it was possible to have a crash on shutdown/exit.  Now that should no longer be possible.

BREAKING CHANGE: S3 can only be initialized and finalized once.

BREAKING CHANGE: S3 (the AWS SDK) will not be finalized until after all CPU & I/O threads are finished.
* Closes: #15054

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
@assignUser assignUser added Breaking Change Includes a breaking change to the API Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. labels Apr 17, 2023
@assignUser
Copy link
Member

I have added the appropriate labels for visibility

@ursabot
Copy link

ursabot commented Apr 17, 2023

Benchmark runs are scheduled for baseline = 9626c7d and contender = 1de159d. 1de159d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️0.51% ⬆️0.51%] ursa-i9-9960x
[Finished ⬇️3.42% ⬆️0.89%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 1de159d0 ec2-t3-xlarge-us-east-2
[Failed] 1de159d0 test-mac-arm
[Finished] 1de159d0 ursa-i9-9960x
[Finished] 1de159d0 ursa-thinkcentre-m75q
[Finished] 9626c7d5 ec2-t3-xlarge-us-east-2
[Failed] 9626c7d5 test-mac-arm
[Finished] 9626c7d5 ursa-i9-9960x
[Finished] 9626c7d5 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…reads finished, add pyarrow exit hook (apache#33858)

CRITICAL FIX: When statically linking error with AWS it was possible to have a crash on shutdown/exit.  Now that should no longer be possible.

BREAKING CHANGE: S3 can only be initialized and finalized once.

BREAKING CHANGE: S3 (the AWS SDK) will not be finalized until after all CPU & I/O threads are finished.
* Closes: apache#15054

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…reads finished, add pyarrow exit hook (apache#33858)

CRITICAL FIX: When statically linking error with AWS it was possible to have a crash on shutdown/exit.  Now that should no longer be possible.

BREAKING CHANGE: S3 can only be initialized and finalized once.

BREAKING CHANGE: S3 (the AWS SDK) will not be finalized until after all CPU & I/O threads are finished.
* Closes: apache#15054

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
…reads finished, add pyarrow exit hook (apache#33858)

CRITICAL FIX: When statically linking error with AWS it was possible to have a crash on shutdown/exit.  Now that should no longer be possible.

BREAKING CHANGE: S3 can only be initialized and finalized once.

BREAKING CHANGE: S3 (the AWS SDK) will not be finalized until after all CPU & I/O threads are finished.
* Closes: apache#15054

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review Awaiting change review Breaking Change Includes a breaking change to the API Component: C++ Component: Python Component: R Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][Python] wheel-manylinux2014-* sometimes crashed on pytest exit
9 participants