Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Active Messages support #48

Merged
merged 40 commits into from
Jun 12, 2023

Conversation

pentschev
Copy link
Member

Introduce new ucxx::RequestAm class to allow transferring via the Active Messages UCX API.

@pentschev pentschev requested review from a team as code owners May 19, 2023 13:58
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A partial look, I haven't gone through the logical of the implementation properly yet.

cpp/include/ucxx/endpoint.h Outdated Show resolved Hide resolved
cpp/include/ucxx/endpoint.h Outdated Show resolved Hide resolved
cpp/include/ucxx/internal/request_am.h Show resolved Hide resolved
cpp/include/ucxx/worker.h Outdated Show resolved Hide resolved
cpp/src/request.cpp Outdated Show resolved Hide resolved
cpp/include/ucxx/worker.h Show resolved Hide resolved
Copy link
Member Author

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wence- for looking at this, apologies for not addressing it earlier but somehow I completely missed the notification for this PR. I've addressed/responded to your comments now, please have a look when you've got the chance.

cpp/include/ucxx/endpoint.h Outdated Show resolved Hide resolved
cpp/include/ucxx/endpoint.h Outdated Show resolved Hide resolved
cpp/include/ucxx/internal/request_am.h Show resolved Hide resolved
cpp/include/ucxx/worker.h Show resolved Hide resolved
cpp/include/ucxx/worker.h Outdated Show resolved Hide resolved
cpp/src/request.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a little cleanup we can do in the recvWait/recvPool mapping handling, and then some other minor cleanups/queries in the logic side of things.

cpp/include/ucxx/worker.h Show resolved Hide resolved
cpp/src/request_am.cpp Show resolved Hide resolved
cpp/src/request_am.cpp Outdated Show resolved Hide resolved
cpp/src/request_am.cpp Show resolved Hide resolved
cpp/src/request_am.cpp Outdated Show resolved Hide resolved
python/ucxx/_lib/libucxx.pyx Show resolved Hide resolved
python/ucxx/_lib/libucxx.pyx Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
Copy link
Member Author

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wence- for another great, thorough review. I've addressed your comments, could you please take another look when you have a chance?

cpp/src/endpoint.cpp Show resolved Hide resolved
cpp/src/internal/request_am.cpp Show resolved Hide resolved
cpp/src/internal/request_am.cpp Show resolved Hide resolved

void RecvAmMessage::callback(void* request, ucs_status_t status)
{
_request->_buffer = _buffer;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can _request be used for something else during the lifetime of this? if not, why not set up the _request->_buffer pointer in the constructor of RecvAmMessage, if yes, then this seems potentially racy.

It shouldn't be used for anything else. The reason I wrote the code in this way is to prevent the user from calling getRecvBuffer before isCompleted() == true && getStatus() == UCS_OK and get a proper Buffer that has invalid data. Do you think it would be better to handle it differently?

cpp/src/request.cpp Show resolved Hide resolved
python/ucxx/_lib/libucxx.pyx Show resolved Hide resolved
python/ucxx/_lib/libucxx.pyx Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
python/ucxx/_lib_async/endpoint.py Outdated Show resolved Hide resolved
Copy link
Member Author

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wence- for looking at it again. I think I've addressed all your comments, but please let me know if I missed something.

cpp/src/internal/request_am.cpp Show resolved Hide resolved

void RecvAmMessage::callback(void* request, ucs_status_t status)
{
_request->_buffer = _buffer;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, done in 6eb5f95.

cpp/src/worker.cpp Show resolved Hide resolved
@wence-
Copy link
Contributor

wence- commented Jun 5, 2023

Thanks, not sure what went on in the tests though

@pentschev pentschev requested a review from a team as a code owner June 6, 2023 19:20
@pentschev pentschev changed the base branch from branch-0.32 to branch-0.33 June 6, 2023 19:20
@pentschev pentschev mentioned this pull request Jun 7, 2023
@ajschmidt8 ajschmidt8 removed the request for review from a team June 8, 2023 16:21
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very minor comment.

cpp/src/delayed_submission.cpp Outdated Show resolved Hide resolved
// delayed to allow the worker progress thread to set its status, and more
// importantly the Python future later on, so that we don't need the GIL here.
req->_worker->registerDelayedSubmission(
req, std::bind(std::mem_fn(&Request::populateDelayedSubmission), req.get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this needs to be moved from the constructor because we need the shared_ptr to this which isn't available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not the friendliest design, but it is the best we can do here (I think).

@@ -35,26 +35,25 @@ void DelayedSubmissionCollection::process()
toProcess = std::move(_collection);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does std::move do to _collection? Does it just drain the vector, leaving it "as-if" it were empty again? I guess so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that is the standard behavior for the move constructor and operator.

pentschev and others added 3 commits June 9, 2023 09:58
Required until progressing the worker is not necessary within the call
anymore.
Follow the default of other tests for now, relying on the default
progress mode or that specified via environment variables.
Copy link
Member Author

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wence- , responded and addressed your suggestion.

@@ -35,26 +35,25 @@ void DelayedSubmissionCollection::process()
toProcess = std::move(_collection);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that is the standard behavior for the move constructor and operator.

// delayed to allow the worker progress thread to set its status, and more
// importantly the Python future later on, so that we don't need the GIL here.
req->_worker->registerDelayedSubmission(
req, std::bind(std::mem_fn(&Request::populateDelayedSubmission), req.get()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not the friendliest design, but it is the best we can do here (I think).

@pentschev
Copy link
Member Author

All tests pass now, I'll leave this to be merged Monday in case you have any last comments @wence- , thanks again for reviewing!

@pentschev
Copy link
Member Author

Thanks @wence- and @robertmaynard for reviews and approvals.

@pentschev
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 24ad286 into rapidsai:branch-0.33 Jun 12, 2023
@pentschev pentschev deleted the active-messages branch June 12, 2023 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants