Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAPT: Event-driven collective implementation #7716

Merged
merged 7 commits into from
Sep 1, 2020

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented May 9, 2020

The designs of MPI collectives rely on implicit synchronizations between processes due to message ordering; these designs magnify noise across the participating processes, resulting in significant performance slowdown. ADAPT presents a new perspective , using event-driven techniques to match the collective needs of data movement with the available bandwidth on the platform. The core concept of ADAPT is to relax synchronizations, while maintaining the minimal data dependencies of MPI collectives.

For more details and a performance assessment: DOI 10.1145/3208040.3208054.

@awlauria

This comment has been minimized.

@awlauria awlauria linked an issue May 12, 2020 that may be closed by this pull request
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, some (probably minor) issues in the comments.

ompi/mca/coll/adapt/coll_adapt_ibcast.c Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ibcast.c Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ibcast.c Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ibcast.c Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ibcast.c Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_inbuf.h Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ireduce.c Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ireduce.c Outdated Show resolved Hide resolved
ompi/mca/coll/adapt/coll_adapt_ireduce.c Outdated Show resolved Hide resolved
ompi/request/request.h Show resolved Hide resolved
ompi_request_set_callback(send_req, send_cb, send_context);
}

OPAL_THREAD_LOCK(context->con->mutex_num_sent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that mutex really needed? It seems like an atomic operation would do just fine here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that technically you are correct, we could have done it with an atomic operation instead. We will revisit this code in the future and clean such cases, but I don't think it has any impact neither on correctness nor performance.

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All points I had have been addressed, thanks @bosilca 👍

@shijin-aws
Copy link
Contributor

We tested this PR with applications HPCG, LAMMPS, WRFV, OPENFOAM on a 8 node cluster, with both TCP and EFA provider. ADAPT passed all the tests we did.

dycz0fx and others added 7 commits August 24, 2020 12:13
Add comments in the ADAPT module

Signed-off-by: Xi Luo <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
* piggybacking Bull functionalities

* coll/adapt: Fix naming conventions and C11 atomic use

This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);

It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.

* coll/adapt: Remove unused component field in module

This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.

Signed-off-by: Marc Sergent <[email protected]>
Co-authored-by: Lemarinier, Pierre <[email protected]>
Co-authored-by: pierrele <[email protected]>
API consistent with other collective modules
Add comments
Other minor cleanups.

Signed-off-by: George Bosilca <[email protected]>
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.

Signed-off-by: George Bosilca <[email protected]>
Set request in ibcast.c to empty when the count is 0.

Signed-off-by: Xi Luo <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
@jsquyres jsquyres merged commit 560ebc5 into open-mpi:master Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved Collectives Communications
6 participants