Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build extensions in parallel #1882

Merged
merged 4 commits into from
Feb 25, 2025
Merged

Conversation

yuantailing
Copy link
Contributor

@yuantailing yuantailing commented Feb 20, 2025

Previous Behaviour: extensions are built in series, despite the fact that multiple files in the same extension are compiled in parallel

This pull request adds support for parallel building of multiple extensions. Benchmark results show:

CPU Build Parameters Build Time
AMD EPYC 24-Core (48 threads) Original (No optimizations) 45m29.243s
AMD EPYC 24-Core (48 threads) --parallel 4 12m55.243s
AMD EPYC 24-Core (48 threads) --parallel 16 6m47.962s
AMD EPYC 24-Core (48 threads) NVCC_APPEND_FLAGS="--threads 8" 19m23.878s
AMD EPYC 24-Core (48 threads) --parallel 4, NVCC_APPEND_FLAGS="--threads 8" 7m33.151s
AMD EPYC 24-Core (48 threads) --parallel 16, NVCC_APPEND_FLAGS="--threads 8" 5m58.479s
Intel Xeon 112-Core (224 threads) NVCC_APPEND_FLAGS="--threads 8" 14m9.081s
Intel Xeon 112-Core (224 threads) --parallel 16, NVCC_APPEND_FLAGS="--threads 8" 2m24.733s

Memory usage is shown below. The "mem used" values are obtained using the free command, and background memory usage is included.

image

Build Parameters Peak mem used
Original 24.11 GiB
--parallel 16 58.25 GiB
NVCC_APPEND_FLAGS="--threads 8" 91.39 GiB
--parallel 16, NVCC_APPEND_FLAGS="--threads 8" 150.96 GiB

Image: nvcr.io/nvidia/pytorch:25.01-py3
(or other images with the same CUDA version and TORCH_CUDA_ARCH_LISTS)

cmdline: time NVCC_APPEND_FLAGS="--threads 8" pip wheel -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --distributed_adam --distributed_lamb --cuda_ext --permutation_search --bnp --xentropy --focal_loss --group_norm --index_mul_2d --deprecated_fused_adam --deprecated_fused_lamb --fast_layer_norm --fmha --fast_multihead_attn --transducer --cudnn_gbn --peer_memory --nccl_p2p --fast_bottleneck --fused_conv_bias_relu --nccl_allocator --gpu_direct_storage --parallel 16" ./

@alpha0422
Copy link
Contributor

@crcrpar Could you help review this PR? This reduces APEX build time a lot.

README.md Outdated
@@ -130,7 +130,7 @@ CUDA and C++ extensions via
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --cuda_ext --parallel 4" ./
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise --thread option, this would increase CPU mem usage, so could you separately add the example command with --thread and --parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated README.md

README.md Outdated
@@ -135,6 +135,13 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation -
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```

To reduce the build time of APEX, parallel building can be enhanced via
```bash
export NVCC_APPEND_FLAGS="--threads 4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not exporting this env, it affects nvcc globally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to Temporary Environment Scope

Copy link
Collaborator

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this nice option

@crcrpar crcrpar merged commit c9e6f05 into NVIDIA:master Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants