Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segmentation fault on canceled requests that are not awaited #153

Merged
merged 6 commits into from
Jan 9, 2024

Conversation

pentschev
Copy link
Member

Calling InflightRequests::cancelAll() does not guarantee the requests to be canceled immediately, just like any other request we must check their status before releasing the ucxx::Request object. To prevent the user from releasing a request that has not yet completed and may still call the completion callback we must make sure we still keep a reference
to it until its status is set. With this change we now ensure both ucxx::Endpoint and ucxx::Worker will only release references after those requests that have issued cancelation complete.

It is also important that all requests have a valid status before destroying the object, thus we should cancel a request if all references to the Cython UCXRequest object have been dropped but the request has not completed yet.

Additionally use std::unique_ptr<InflightRequests> in ucxx::Worker and reenable test_ucxx_unreachable.

Calling `InflightRequests::cancelAll()` does not guarantee the requests
to be canceled immediately, just like any other request we must check
their status before releasing the `ucxx::Request` object. To prevent the
user from releasing a request that has not yet completed and may still
call the completion callback we must make sure we still keep a reference
to it until its status is set. With this change we now ensure both
`ucxx::Endpoint` and `ucxx::Worker` will only release references after
those requests that have issued cancelation complete.
It is important that all requests have a valid status before objects can
be destroyed, thus we should cancel it if all references to the Cython
`UCXRequest` object have been dropped but the request has not completed
yet.
@pentschev pentschev added bug Something isn't working non-breaking Introduces a non-breaking change labels Jan 8, 2024
@pentschev pentschev requested review from a team as code owners January 8, 2024 16:36
cpp/include/ucxx/inflight_requests.h Outdated Show resolved Hide resolved
cpp/src/inflight_requests.cpp Outdated Show resolved Hide resolved
cpp/src/inflight_requests.cpp Outdated Show resolved Hide resolved
cpp/src/inflight_requests.cpp Show resolved Hide resolved
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Peter!

@pentschev
Copy link
Member Author

Thanks for reviewing this one too, @wence- !

@pentschev
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 94c5dd7 into rapidsai:branch-0.36 Jan 9, 2024
20 checks passed
@pentschev pentschev deleted the fix-inflight-cancel branch January 9, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants