-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: sicm refresh #3
base: main
Are you sure you want to change the base?
Conversation
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric versions. While FI_DELIVERY_COMPLETE would be advertised by the provider, completions would return too early by not accounting for bounce buffers on the receive side. This would cause the BTL to receive early completions that lead to correctness issues. This is not an issue in the mtl/ofi as it does not require FI_DELIVERY_COMPLETE. Signed-off-by: William Zhang <[email protected]>
This is a rebased version of |
The ofi_rxm provider is dependent upon the underlying hardware for its implementation of FI_DELIVERY_COMPLETE. Since this can lead to early completions, we disable the provider to avoid correctness issues. This is not an issue in the mtl/ofi as it does not require FI_DELIVERY_COMPLETE. Signed-off-by: William Zhang <[email protected]>
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
The C++ bindings were removed a while ago; MPI::ERRORS_THROW_EXCEPTIONS and MPI_ERRORS_THROW_EXCEPTIONS no longer exist. Signed-off-by: Jeff Squyres <[email protected]>
MPI-4 is finally cleaning up its language: an MPI "exception" does not actually exist. The only thing that exists is an MPI "error" (and associated handlers). This commit replaces all relevant uses of the word "exception" with "error". Note that this is still applicable in versions of the MPI standard less than MPI-4.0 (indeed, nearly all the cases fixed in this commit are just changes to comments, anyway). One exception to this is the Java bindings, where there's an MPIException class. In hindsight, it probably should have been named MPIError, but changing it now would break anyone who is using the Java bindings. Signed-off-by: Jeff Squyres <[email protected]>
…rs-and-exceptions Cleanup of MPI errors and exceptions
the ofi mtl mrecv was not properly setting the message in/out arg to MPI_MRECV to MPI_MESSAGE_NULL. Signed-off-by: Howard Pritchard <[email protected]>
…_fix ofi mtl: fix problem with mrecv
improve configury to check whether icc is handling no long double. This prevents seeing 100s of messages like this: icc: command line warning open-mpi#10148: option '-Wno-long-double' not supported A similar patch will be needed for pmix. Signed-off-by: Howard Pritchard <[email protected]>
Add comments in the ADAPT module Signed-off-by: Xi Luo <[email protected]> Signed-off-by: George Bosilca <[email protected]>
* piggybacking Bull functionalities * coll/adapt: Fix naming conventions and C11 atomic use This commit fixes some naming convention issues, such as function names which should follow the naming ompi_coll_adapt instead of mca_coll_adapt, reserved for component and module naming (cf. tuned collective component); It also fixes the use of _Atomic construct, which is only valid in C11. OPAL constructs have already been adapted to that use, so use opal_atomic_* types instead. * coll/adapt: Remove unused component field in module This commit removes an unneeded field referencing the component in the module of adapt, as it is already available through the mca_coll_adapt_component global variable. Signed-off-by: Marc Sergent <[email protected]> Co-authored-by: Lemarinier, Pierre <[email protected]> Co-authored-by: pierrele <[email protected]>
API consistent with other collective modules Add comments Other minor cleanups. Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
As it is possible to have multiple outstanding non-blocking collectives provided by different collective modules, we need a consistent mechanism to allow them to select unique tags for each instance of a collective. Signed-off-by: George Bosilca <[email protected]>
Set request in ibcast.c to empty when the count is 0. Signed-off-by: Xi Luo <[email protected]> Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Reduce scatter block and reduce scatter algorithms were hitting correctness issues for non commutative strided tests. We will revert to the original default algorithms for those two collectives (basic linear and non overlapping respectively) in the non commutative op case. See open-mpi#8010 Signed-off-by: William Zhang <[email protected]>
coll/tuned: Revert RSB and RS default algorithms
Uncovered a problem using the GNI provider with the OFI MTL. See ofiwg/libfabric#6194. Related to open-mpi#8001 Signed-off-by: Howard Pritchard <[email protected]>
…og_warning suppress icc long double message
…tch_for_mtl OFI: patch OFI MTL for GNI provider
input. Signed-off-by: Aurelien Bouteiller <[email protected]>
errors_are_fatal_comm_handler takes a pointer to the error constant
Thanks to Andy Riebs for reporting that on the Open MPI user mailing list (https://www.mail-archive.com/[email protected]/msg34103.html) Signed-off-by: Joseph Schuchart <[email protected]>
Fix MPI versions in MPI.3 manpage
This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET descriptor flags. At some point these had some meaning but they were replaced by the rcache access flags. Signed-off-by: Nathan Hjelm <[email protected]>
btl: remove unused descriptor flags
ADAPT: Event-driven collective implementation
Flushing or freeing a newly created dynamic window causes NULL to be passed. Signed-off-by: Joseph Schuchart <[email protected]>
…-segfault UCX: do not dereference NULL pointer in wpmem_[free|flush]
Signed-off-by: Joseph Schuchart <[email protected]>
Make the C MPI_F08_status type definition match the updated mpi_f08 type(MPI_Status) definition. This fix the inconsistency introduced in open-mpi/ompi@98bc7af Signed-off-by: Gilles Gouaillardet <[email protected]>
manually cleanup the generated .mod file in OMPI_FORTRAN_CHECK_BIND_C_TYPE Signed-off-by: Gilles Gouaillardet <[email protected]>
Only in C bindings: - MPI_Status_c2f08() - MPI_Status_f082c() In all bindings but mpif.h - MPI_Status_f082f() - MPI_Status_f2f08() and the PMPI_* related subroutines As initially inteded by the MPI forum, the Fortran to/from Fortran 2008 conversion subtoutines are *not* implemented in the mpif.h bindings. See the discussion at mpi-forum/mpi-issues#298 Refs. open-mpi#1475 Signed-off-by: Gilles Gouaillardet <[email protected]>
Convert the MPI_Status_f082f, MPI_Status_f082c, and MPI_Status_f2c man pages to Markdown. Fix some typos and improve the text a bit along the way. Left the raw NROFF redirect pages MPI_Status_f2f08, MPI_Status_c2f08, and MPI_Status_c2f files as they were -- they're 1-line redirects, and it seems simpler to leave those (vs. duplicating the Markdown). Signed-off-by: Jeff Squyres <[email protected]>
…f08_c Add missing MPI_Status conversion subroutines
Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: Gilles Gouaillardet <[email protected]>
mpif-h: fix a typo in MPI_Status_f2f08()
Use correct conditional variable initializer in opal/mca/pmix/base
Fix a copy/paste in the RDMA emulation.
- Add arg for newer SICM API in sicm_arena_create() - Fix header path for OMPI master include 'mutex_unix.h' Signed-off-by: Thomas Naughton <[email protected]>
78fcbca
to
6969b99
Compare
I rebased against master again to pull in fixes for RMA problems I was encountering. I also needed to move up to |
Note - there is still some bug in here (outside of mpool changes I want to test) b/c a debug build will fail an assert in |
- Fixes issues encountered while testing on Summit w/ HBM - Avoids multiple calls to sicm_init() to set device list - Adds ability to have more than one device type (module list) - Adds call to sicm_arena_destroy() during free - Adds opal outputs, verbose/debug (dump device info, etc.) Signed-off-by: Thomas Naughton <[email protected]>
Set envvar `OMPIX_MPOOL_NO_FALLBACK` to avoid fallback to malloc(). This should probably be an MCA but quick fix for mpool testing. Signed-off-by: Thomas Naughton <[email protected]>
Signed-off-by: Thomas Naughton <[email protected]>
No description provided.