Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

adrianlizarraga
Copy link
Contributor

Description

Fixes unused use_memory_efficient_attention variable in contrib_ops/cuda/bert/attention_impl.cu.

Motivation and Context

ORT with CUDA version < 11.6 fails to build for release configurations due to an unused variable.

c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx
runtime_providers_cuda.vcxproj]
            detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit
  h T=float]"
  (923): here

This happens for CUDA < 11.6. Our cmake script turns off onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the aforementioned variable unused outside of asserts (which are removed in release builds).

The USE_FLASH_ATTENTION option was added by #14343

@adrianlizarraga adrianlizarraga changed the title Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION set to OFF Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off Jan 23, 2023
@tianleiwu
Copy link
Contributor

There are a few builds (like CPU package) with USE_FLASH_ATTENTION off and those builds are good because they will not build the contrib_ops\cuda folder. This error only happens with CUDA build with CUDA < 11.6.

@adrianlizarraga adrianlizarraga merged commit 85d7e9c into main Jan 24, 2023
@adrianlizarraga adrianlizarraga deleted the adrianl/fix-attention-unused-var branch January 24, 2023 17:32
@rui-ren rui-ren added release:1.14 triage:approved Approved for cherrypicks for release labels Feb 3, 2023
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
…14404)

### Description
Fixes unused `use_memory_efficient_attention` variable in
contrib_ops/cuda/bert/attention_impl.cu.



### Motivation and Context
ORT with CUDA version < 11.6 fails to build for release configurations
due to an unused variable.

```shell
c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx
runtime_providers_cuda.vcxproj]
            detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit
  h T=float]"
  (923): here
```

This happens for CUDA < 11.6. Our cmake script turns off
onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the
aforementioned variable unused outside of asserts (which are removed in
release builds).

The USE_FLASH_ATTENTION option was added by
#14343
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
…14404)

### Description
Fixes unused `use_memory_efficient_attention` variable in
contrib_ops/cuda/bert/attention_impl.cu.



### Motivation and Context
ORT with CUDA version < 11.6 fails to build for release configurations
due to an unused variable.

```shell
c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx
runtime_providers_cuda.vcxproj]
            detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit
  h T=float]"
  (923): here
```

This happens for CUDA < 11.6. Our cmake script turns off
onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the
aforementioned variable unused outside of asserts (which are removed in
release builds).

The USE_FLASH_ATTENTION option was added by
#14343
@faxu faxu removed the release:1.14 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants