build extensions in parallel #1882

yuantailing · 2025-02-20T04:40:42Z

Previous Behaviour: extensions are built in series, despite the fact that multiple files in the same extension are compiled in parallel

This pull request adds support for parallel building of multiple extensions. Benchmark results show:

CPU	Build Parameters	Build Time
AMD EPYC 24-Core (48 threads)	Original (No optimizations)	45m29.243s
AMD EPYC 24-Core (48 threads)	`--parallel 4`	12m55.243s
AMD EPYC 24-Core (48 threads)	`--parallel 16`	6m47.962s
AMD EPYC 24-Core (48 threads)	`NVCC_APPEND_FLAGS="--threads 8"`	19m23.878s
AMD EPYC 24-Core (48 threads)	`--parallel 4`, `NVCC_APPEND_FLAGS="--threads 8"`	7m33.151s
AMD EPYC 24-Core (48 threads)	`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	5m58.479s
Intel Xeon 112-Core (224 threads)	`NVCC_APPEND_FLAGS="--threads 8"`	14m9.081s
Intel Xeon 112-Core (224 threads)	`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	2m24.733s

Memory usage is shown below. The "mem used" values are obtained using the free command, and background memory usage is included.

Build Parameters	Peak mem used
Original	24.11 GiB
`--parallel 16`	58.25 GiB
`NVCC_APPEND_FLAGS="--threads 8"`	91.39 GiB
`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	150.96 GiB

Image: nvcr.io/nvidia/pytorch:25.01-py3
(or other images with the same CUDA version and TORCH_CUDA_ARCH_LISTS)

cmdline: time NVCC_APPEND_FLAGS="--threads 8" pip wheel -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --distributed_adam --distributed_lamb --cuda_ext --permutation_search --bnp --xentropy --focal_loss --group_norm --index_mul_2d --deprecated_fused_adam --deprecated_fused_lamb --fast_layer_norm --fmha --fast_multihead_attn --transducer --cudnn_gbn --peer_memory --nccl_p2p --fast_bottleneck --fused_conv_bias_relu --nccl_allocator --gpu_direct_storage --parallel 16" ./

alpha0422 · 2025-02-20T04:43:50Z

@crcrpar Could you help review this PR? This reduces APEX build time a lot.

crcrpar · 2025-02-21T07:29:15Z

README.md

@@ -130,7 +130,7 @@ CUDA and C++ extensions via
 git clone https://github.com/NVIDIA/apex
 cd apex
 # if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
-pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
+pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --cuda_ext --parallel 4" ./


Likewise --thread option, this would increase CPU mem usage, so could you separately add the example command with --thread and --parallel?

Updated README.md

alpha0422 · 2025-02-24T01:53:10Z

README.md

@@ -135,6 +135,13 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation -
 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
 ```

+To reduce the build time of APEX, parallel building can be enhanced via
+```bash
+export NVCC_APPEND_FLAGS="--threads 4"


I'd suggest not exporting this env, it affects nvcc globally.

Moved it to Temporary Environment Scope

crcrpar

Thank you for implementing this nice option

build extensions in parallel

f0df092

crcrpar reviewed Feb 21, 2025

View reviewed changes

fix: setup.py develop supports --parallel

d9c3507

yuantailing force-pushed the build_in_parallel branch from 1ef7547 to d9c3507 Compare February 21, 2025 08:50

update README.md

f1ee5b1

yuantailing force-pushed the build_in_parallel branch from 8e833de to f1ee5b1 Compare February 21, 2025 18:48

alpha0422 reviewed Feb 24, 2025

View reviewed changes

Update README.md: Remove an export command

fd28caf

crcrpar approved these changes Feb 25, 2025

View reviewed changes

crcrpar merged commit c9e6f05 into NVIDIA:master Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build extensions in parallel #1882

build extensions in parallel #1882

yuantailing commented Feb 20, 2025 •

edited

Loading

alpha0422 commented Feb 20, 2025

crcrpar Feb 21, 2025

yuantailing Feb 21, 2025

alpha0422 Feb 24, 2025

yuantailing Feb 24, 2025

crcrpar left a comment

build extensions in parallel #1882

build extensions in parallel #1882

Conversation

yuantailing commented Feb 20, 2025 • edited Loading

alpha0422 commented Feb 20, 2025

crcrpar Feb 21, 2025

Choose a reason for hiding this comment

yuantailing Feb 21, 2025

Choose a reason for hiding this comment

alpha0422 Feb 24, 2025

Choose a reason for hiding this comment

yuantailing Feb 24, 2025

Choose a reason for hiding this comment

crcrpar left a comment

Choose a reason for hiding this comment

yuantailing commented Feb 20, 2025 •

edited

Loading