-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in CRT HTTP engine connection shutdown #413
Comments
I'm experiencing the same issue on Linux with OpenJDK 11 and 17. It fails if I'm launching 20 coroutines concurrently, but works on with 5. I'm using the SDK to upload about 2000 files of 100kb - 2mb to S3. |
@stiost Thanks for reporting this. Do you have some example code you might be able to share that triggers this on Linux? We still have only been able to reproduce this in CI on Windows. Any other details about your environment may help as well. |
Same thing happening on Mac: val s3 = S3Client.fromEnvironment()
repeat(10000) {
launch {
val number = it.toString().padStart(8, '0')
setOf("foo", "bar", "baz").map {
async {
gate.withPermit {
s3.putObject {
bucket = bucketName
this.key = "$number/$it"
body = ByteStream.fromString("boop")
}
}
}
}.awaitAll()
}
} |
Status update: I've added a bunch of debugging output to crt-java to track close/shutdown calls. It looks like the Kotlin SDK is in fact invoking The coroutine that gets cancelled causes an exception to be handled here. The second time is triggered from the completion handler registered here (which ends up in this function).
These were meant to handle exceptions at different times in the execution of a request to ensure a connection is returned to the pool. It turns out an exception can trigger both paths (since one is registered on the coroutine job). Depending on the state of the connection we can end up calling |
fixes: #413 Fixes the segfault that can happen when an exception is handled twice leading to a connection being closed after it has been free'd. This change refactors the handling of the connection close logic to be handled in a single place regardless of why the connection is being closed.
|
This should be fixed in |
Describe the bug
One of our async stress tests for the CRT HTTP engine is segfaulting on windows.
Steps to Reproduce
I've only been able to reproduce this in CI so far. Here is a failing CI run with full logs and JVM dump as an artifact:
https://github.com/awslabs/aws-sdk-kotlin/actions/runs/1446034029
Possible Solution
CRT team is currently investigating
Context
The text was updated successfully, but these errors were encountered: