diff --git a/docker/Dockerfile b/docker/Dockerfile index b302a46..2865bef 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -55,7 +55,10 @@ RUN build_data # To work around a known XLA issue causing the compilation time to greatly # increase, the following environment variable setting XLA flags must be enabled -# when running AlphaFold 3: +# when running AlphaFold 3. Note that if using CUDA capability 7 GPUs, it is +# necessary to set the following XLA_FLAGS value instead: +# ENV XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" +# (no need to disable gemm in that case as it is not supported for such GPU). ENV XLA_FLAGS="--xla_gpu_enable_triton_gemm=false" # Memory settings used for folding up to 5,120 tokens on A100 80 GB. ENV XLA_PYTHON_CLIENT_PREALLOCATE=true diff --git a/docs/known_issues.md b/docs/known_issues.md index c7b45c3..2f0be4f 100644 --- a/docs/known_issues.md +++ b/docs/known_issues.md @@ -1,39 +1,11 @@ # Known Issues -## Numerical performance for different GPU devices - -There are numerical performance issues with some GPU types that are under -investigation, see -[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for -tracking. - -### Verified devices - -We have run successful large-scale numerical tests for the following devices and -maximum number of tokens: - -- H100 80 GB: up to 5,120 tokens. -- A100 80 GB: up to 5,120 tokens. -- A100 40 GB: up to 4,352 tokens with - [unified memory configuration](https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb). -- P100 16 GB: up to 1,024 tokens. - -Note that the 80 GB devices can run larger targets using unified memory, but -outputs have only been verified on particular examples rather than a large-scale -test set. - -#### CUDA Capability 7.x GPUs: known issues +## Numerical performance for CUDA Capability 7.x GPUs All CUDA Capability 7.x GPUs (e.g. V100) produce obviously bad output, with lots -of clashing residues (the clashes cause a ranking score of -99 or lower). With a -small fix relating to `bfloat16` conversion to `float32` outputs look normal, -but there are numerical performance regressions for some bucket sizes (tested on -V100 devices). - -#### CUDA Capability 6.x GPUs: no known issues - -CUDA Capability 6.x GPUs give reasonable output, but large scale numerical -testing has only been done for P100. +of clashing residues (the clashes cause a ranking score of -99 or lower), unless +the environment variable `XLA_FLAGS` is set to include +`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`. ## Incorrect handling of two-letter atoms in SMILES ligands diff --git a/docs/performance.md b/docs/performance.md index 2e2508c..8785284 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -98,26 +98,28 @@ AlphaFold 3 can run on inputs of size up to 4,352 tokens on a single NVIDIA A100 While numerically accurate, this configuration will have lower throughput compared to the set up on the NVIDIA A100 (80 GB), due to less available memory. +#### NVIDIA V100 + +There are known numerical issues with CUDA Capability 7.x devices. To work +around the issue, set the ENV XLA_FLAGS to include +`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`. + +With the above flag set, AlphaFold 3 can run on inputs of size up to 1,280 +tokens on a single NVIDIA V100 using [unified memory](#unified-memory). + #### NVIDIA P100 AlphaFold 3 can run on inputs of size up to 1,024 tokens on a single NVIDIA P100 with no configuration changes needed. -#### NVIDIA V100 - -There are known issues with V100 devices. See -[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for -tracking. - #### Other devices -There are known issues with CUDA Capability 7.x devices. See -[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for -tracking. +Large-scale numerical tests have not been performed on any other devices but +they are believed to be numerically accurate. -CUDA Capability 6.x and 8.x devices other than those listed explicitly here are -believed to work for AlphaFold 3, but large-scale testing has only been -performed for the devices mentioned above. +There are known numerical issues with CUDA Capability 7.x devices. To work +around the issue, set the environment variable `XLA_FLAGS` to include +`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`. ## Compilation Buckets @@ -166,6 +168,17 @@ in the provided `Dockerfile`). ENV XLA_FLAGS="--xla_gpu_enable_triton_gemm=false" ``` +### CUDA Capability 7.x GPUs + +For all CUDA Capability 7.x GPUs (e.g. V100) the environment variable +`XLA_FLAGS` must be changed to include +`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`. Disabling the Tritron +GEMM kernels is not necessary as they are not supported for such GPUs. + +```sh +ENV XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" +``` + ### GPU Memory The following environment variables (set by default in the `Dockerfile`) enable diff --git a/run_alphafold.py b/run_alphafold.py index 9792cd5..12473ff 100644 --- a/run_alphafold.py +++ b/run_alphafold.py @@ -645,13 +645,14 @@ def main(_): ' https://developer.nvidia.com/cuda-gpus).' ) elif 7.0 <= compute_capability < 8.0: - raise ValueError( - 'There are currently known unresolved numerical issues with using' - ' devices with GPU compute capability 7.x (see' - ' https://developer.nvidia.com/cuda-gpus). Follow ' - ' https://github.com/google-deepmind/alphafold3/issues/59 for' - ' tracking.' - ) + xla_flags = os.environ.get('XLA_FLAGS') + required_flag = '--xla_disable_hlo_passes=custom-kernel-fusion-rewriter' + if not xla_flags or required_flag not in xla_flags: + raise ValueError( + 'For devices with GPU compute capability 7.x (see' + ' https://developer.nvidia.com/cuda-gpus) the ENV XLA_FLAGS must' + f' include "{required_flag}".' + ) notice = textwrap.wrap( 'Running AlphaFold 3. Please note that standard AlphaFold 3 model' diff --git a/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_1024.pkl b/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_1024.pkl index 12b1caf..39b8313 100644 Binary files a/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_1024.pkl and b/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_1024.pkl differ diff --git a/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_default.pkl b/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_default.pkl index 60530a5..29259b8 100644 Binary files a/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_default.pkl and b/src/alphafold3/test_data/alphafold_run_outputs/run_alphafold_test_output_bucket_default.pkl differ