Skip to content

Commit

Permalink
Add XLA flag workaround for CUDA Capability 7.x GPUs
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 705559274
  • Loading branch information
jacobjinkelly committed Dec 13, 2024
1 parent 2e56235 commit 781d8d0
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 52 deletions.
5 changes: 4 additions & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,10 @@ RUN build_data

# To work around a known XLA issue causing the compilation time to greatly
# increase, the following environment variable setting XLA flags must be enabled
# when running AlphaFold 3:
# when running AlphaFold 3. Note that if using CUDA capability 7 GPUs, it is
# necessary to set the following XLA_FLAGS value instead:
# ENV XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
# (no need to disable gemm in that case as it is not supported for such GPU).
ENV XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# Memory settings used for folding up to 5,120 tokens on A100 80 GB.
ENV XLA_PYTHON_CLIENT_PREALLOCATE=true
Expand Down
36 changes: 4 additions & 32 deletions docs/known_issues.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,11 @@
# Known Issues

## Numerical performance for different GPU devices

There are numerical performance issues with some GPU types that are under
investigation, see
[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
tracking.

### Verified devices

We have run successful large-scale numerical tests for the following devices and
maximum number of tokens:

- H100 80 GB: up to 5,120 tokens.
- A100 80 GB: up to 5,120 tokens.
- A100 40 GB: up to 4,352 tokens with
[unified memory configuration](https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb).
- P100 16 GB: up to 1,024 tokens.

Note that the 80 GB devices can run larger targets using unified memory, but
outputs have only been verified on particular examples rather than a large-scale
test set.

#### CUDA Capability 7.x GPUs: known issues
## Numerical performance for CUDA Capability 7.x GPUs

All CUDA Capability 7.x GPUs (e.g. V100) produce obviously bad output, with lots
of clashing residues (the clashes cause a ranking score of -99 or lower). With a
small fix relating to `bfloat16` conversion to `float32` outputs look normal,
but there are numerical performance regressions for some bucket sizes (tested on
V100 devices).

#### CUDA Capability 6.x GPUs: no known issues

CUDA Capability 6.x GPUs give reasonable output, but large scale numerical
testing has only been done for P100.
of clashing residues (the clashes cause a ranking score of -99 or lower), unless
the environment variable `XLA_FLAGS` is set to include
`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`.

## Incorrect handling of two-letter atoms in SMILES ligands

Expand Down
37 changes: 25 additions & 12 deletions docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,26 +98,28 @@ AlphaFold 3 can run on inputs of size up to 4,352 tokens on a single NVIDIA A100
While numerically accurate, this configuration will have lower throughput
compared to the set up on the NVIDIA A100 (80 GB), due to less available memory.

#### NVIDIA V100

There are known numerical issues with CUDA Capability 7.x devices. To work
around the issue, set the ENV XLA_FLAGS to include
`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`.

With the above flag set, AlphaFold 3 can run on inputs of size up to 1,280
tokens on a single NVIDIA V100 using [unified memory](#unified-memory).

#### NVIDIA P100

AlphaFold 3 can run on inputs of size up to 1,024 tokens on a single NVIDIA P100
with no configuration changes needed.

#### NVIDIA V100

There are known issues with V100 devices. See
[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
tracking.

#### Other devices

There are known issues with CUDA Capability 7.x devices. See
[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
tracking.
Large-scale numerical tests have not been performed on any other devices but
they are believed to be numerically accurate.

CUDA Capability 6.x and 8.x devices other than those listed explicitly here are
believed to work for AlphaFold 3, but large-scale testing has only been
performed for the devices mentioned above.
There are known numerical issues with CUDA Capability 7.x devices. To work
around the issue, set the environment variable `XLA_FLAGS` to include
`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`.

## Compilation Buckets

Expand Down Expand Up @@ -166,6 +168,17 @@ in the provided `Dockerfile`).
ENV XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
```

### CUDA Capability 7.x GPUs

For all CUDA Capability 7.x GPUs (e.g. V100) the environment variable
`XLA_FLAGS` must be changed to include
`--xla_disable_hlo_passes=custom-kernel-fusion-rewriter`. Disabling the Tritron
GEMM kernels is not necessary as they are not supported for such GPUs.

```sh
ENV XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
```

### GPU Memory

The following environment variables (set by default in the `Dockerfile`) enable
Expand Down
15 changes: 8 additions & 7 deletions run_alphafold.py
Original file line number Diff line number Diff line change
Expand Up @@ -645,13 +645,14 @@ def main(_):
' https://developer.nvidia.com/cuda-gpus).'
)
elif 7.0 <= compute_capability < 8.0:
raise ValueError(
'There are currently known unresolved numerical issues with using'
' devices with GPU compute capability 7.x (see'
' https://developer.nvidia.com/cuda-gpus). Follow '
' https://github.com/google-deepmind/alphafold3/issues/59 for'
' tracking.'
)
xla_flags = os.environ.get('XLA_FLAGS')
required_flag = '--xla_disable_hlo_passes=custom-kernel-fusion-rewriter'
if not xla_flags or required_flag not in xla_flags:
raise ValueError(
'For devices with GPU compute capability 7.x (see'
' https://developer.nvidia.com/cuda-gpus) the ENV XLA_FLAGS must'
f' include "{required_flag}".'
)

notice = textwrap.wrap(
'Running AlphaFold 3. Please note that standard AlphaFold 3 model'
Expand Down
Binary file not shown.
Binary file not shown.

0 comments on commit 781d8d0

Please sign in to comment.