Skip to content

Commit

Permalink
[fix] Lack of HorovodJoin CPU kernels when install Horovod with NCCL,…
Browse files Browse the repository at this point in the history
… which make unable to run horovod_sync_train_test.
  • Loading branch information
MoFHeka committed Dec 15, 2023
1 parent 9699543 commit 2cd9628
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions tools/testing/build_and_run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,19 @@ if ! [ -x "$(command -v nvidia-smi)" ]; then
EXTRA_ARGS="-n auto"
fi

# Lack of HorovodJoin CPU kernels when install Horovod with NCCL
python -m pip uninstall horovod -y
HOROVOD_WITH_TENSORFLOW=1 \
HOROVOD_WITHOUT_PYTORCH=1 \
HOROVOD_WITHOUT_MXNET=1 \
HOROVOD_WITH_MPI=1 \
HOROVOD_WITHOUT_GLOO=1 \
python -m pip install horovod==$HOROVOD_VERSION
# TODO(jamesrong): Test on GPU.
CUDA_VISIBLE_DEVICES="" mpirun -np 2 -H localhost:2 --allow-run-as-root pytest -v ./tensorflow_recommenders_addons/dynamic_embedding/python/kernel_tests/horovod_sync_train_test.py
# Reinstall Horovod after tests
python -m pip uninstall horovod -y
bash /install/install_horovod.sh $HOROVOD_VERSION

# Only use GPU 0 if available.
if [ -x "$(command -v nvidia-smi)" ]; then
Expand Down

0 comments on commit 2cd9628

Please sign in to comment.