Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openib not supported or repalced in openmpi v5.0.x? #8831

Closed
oleotiger opened this issue Apr 20, 2021 · 10 comments
Closed

Openib not supported or repalced in openmpi v5.0.x? #8831

oleotiger opened this issue Apr 20, 2021 · 10 comments

Comments

@oleotiger
Copy link

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

git branch master
head : 6d237e8

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From a git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

e02b84c 3rd-party/openpmix (v1.1.3-2952-ge02b84c)
e2dce0f74a965144024f06a5ec9f517161b2437b 3rd-party/prrte (dev-31142-ge2dce0f)

Please describe the system on which you are running

  • Operating system/version: Centos 7.6
  • Computer hardware: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
  • Network type: Mellonax CX-5

Details of the problem

How I installed:
./configure --prefix=/home/software/libs/openmpi-master --enable-mpi1-compatibility --with-ucx=/home/software/libs/ucx-1.9.0/ --with-knem=/home/software/libs/knem/1.1.4 | tee config.out && make -j |tee make.out && make install | tee install.out

After compiling and installing, mpi doesn't not show support for openib.

[root@localhost ompi]# ompi_info |grep btl
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: ofi (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: uct (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)

If I configure with --with-openib, I got error : configure: WARNING: unrecognized options: --with-openib.

How should I enable openib with openmpi v5.0.x?

@awlauria
Copy link
Contributor

Hi @oleotiger. The openib component was removed from master/v5.0.x because it was unmaintained.
All networks supported by the openib BTL (IB, RoCE, iWARP) should be supported by Libfabric/UCX.

Are you seeing otherwise?

@jsquyres
Copy link
Member

Specifically: Mellanox CX-5 cards should be supported by configuring and building Open MPI with UCX.

@oleotiger
Copy link
Author

There are some specified -mca knobs for openib.
Is there document that introduce some knobs for openfabric?

@gpaulsen
Copy link
Member

@hppritcha Do you know where openfabric knobs are documented?

@hppritcha
Copy link
Member

@gpaulsen let's not confused openib with openfabric (OFI). They are two different things. There are no more openib related mca parameters. These were removed as part of the removal of the openib btl.

@awlauria
Copy link
Contributor

awlauria commented May 27, 2021

@hppritcha I think @gpaulsen was asking regards to this question:

There are some specified -mca knobs for openib.
Is there document that introduce some knobs for openfabric?

I think @oleotiger is asking about the knobs ofi has as a replacement for openib.

@hppritcha
Copy link
Member

mca parameters impacting use of OFI libfabric can be found using the ompi_info command

             MCA btl ofi: ---------------------------------------------------
             MCA btl ofi: parameter "btl_ofi_mode" (current value: "0", data source: default, level: 5 tuner/detail, type: int)
             MCA btl ofi: parameter "btl_ofi_num_cq_read" (current value: "64", data source: default, level: 5 tuner/detail, type: int)
             MCA btl ofi: parameter "btl_ofi_progress_mode" (current value: "unspec", data source: default, level: 5 tuner/detail, type: string)
             MCA btl ofi: parameter "btl_ofi_num_contexts_per_module" (current value: "1", data source: default, level: 5 tuner/detail, type: int)
             MCA btl ofi: parameter "btl_ofi_disable_sep" (current value: "false", data source: default, level: 5 tuner/detail, type: bool)
                          force btl/ofi to never use scalable endpoint.
             MCA btl ofi: parameter "btl_ofi_progress_threshold" (current value: "64", data source: default, level: 5 tuner/detail, type: int)
             MCA btl ofi: parameter "btl_ofi_rd_num" (current value: "10", data source: default, level: 5 tuner/detail, type: int)
             MCA btl ofi: parameter "btl_ofi_provider_include" (current value: "rxm;verbs", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_include)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA btl ofi: parameter "btl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
             MCA btl ofi: parameter "btl_ofi_exclusivity" (current value: "65486", data source: default, level: 7 dev/basic, type: unsigned_int)
             MCA btl ofi: parameter "btl_ofi_flags" (current value: "", data source: default, level: 5 tuner/detail, type: unsigned_int)
             MCA btl ofi: informational "btl_ofi_atomic_flags" (current value: "", data source: default, level: 5 tuner/detail, type: unsigned_int)
             MCA btl ofi: parameter "btl_ofi_rndv_eager_limit" (current value: "0", data source: default, level: 4 tuner/basic, type: size_t)
             MCA btl ofi: parameter "btl_ofi_eager_limit" (current value: "0", data source: default, level: 4 tuner/basic, type: size_t)
             MCA btl ofi: parameter "btl_ofi_max_send_size" (current value: "0", data source: default, level: 4 tuner/basic, type: size_t)
             MCA mtl ofi: ---------------------------------------------------
             MCA mtl ofi: parameter "mtl_ofi_priority" (current value: "25", data source: default, level: 9 dev/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_progress_event_cnt" (current value: "100", data source: default, level: 6 tuner/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_tag_mode" (current value: "auto", data source: default, level: 6 tuner/all, type: int)
                          Mode specifying how many bits to use for various MPI values in OFI/Libfabric communications. Some Libfabric provider network types can support most of Open MPI needs; others can only supply a limited number of bits, which then must be split across the MPI communicator ID, MPI source rank, and MPI tag. Three different splitting schemes are available: ofi_tag_full (30 bits for the communicator, 32 bits for the source rank, and 32 bits for the tag), ofi_tag_1 (12 bits for the communicator, 18 bits source rank, 32 bits tag), ofi_tag_2 (24 bits for the communicator, 18 bits source rank, 20 bits tag). By default, this MCA variable is set to "auto", which will first try to use ofi_tag_full, and if that fails, fall back to ofi_tag_1.
                          Valid values: 1:"auto", 2:"ofi_tag_1", 3:"ofi_tag_2", 4:"ofi_tag_full"
             MCA mtl ofi: parameter "mtl_ofi_control_progress" (current value: "unspec", data source: default, level: 3 user/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_data_progress" (current value: "unspec", data source: default, level: 3 user/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_av" (current value: "map", data source: default, level: 3 user/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_enable_sep" (current value: "0", data source: default, level: 3 user/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_thread_grouping" (current value: "0", data source: default, level: 3 user/all, type: int)
             MCA mtl ofi: parameter "mtl_ofi_num_ctxts" (current value: "1", data source: default, level: 4 tuner/basic, type: int)
             MCA mtl ofi: parameter "mtl_ofi_provider_include" (current value: "rxm;verbs", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_include)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.

Note in my environment I had set the provider_include mca params to "rxm;verbs", hence that shows up in the cut/paste output here.

@awlauria
Copy link
Contributor

Awesome, thanks @hppritcha

@awlauria
Copy link
Contributor

I think we can close this issue?

@hppritcha
Copy link
Member

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants