Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

adrianlizarraga · 2023-01-23T21:49:22Z

Description

Fixes unused use_memory_efficient_attention variable in contrib_ops/cuda/bert/attention_impl.cu.

Motivation and Context

ORT with CUDA version < 11.6 fails to build for release configurations due to an unused variable.

c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx
runtime_providers_cuda.vcxproj]
            detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit
  h T=float]"
  (923): here

This happens for CUDA < 11.6. Our cmake script turns off onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the aforementioned variable unused outside of asserts (which are removed in release builds).

The USE_FLASH_ATTENTION option was added by #14343

…SH_ATTENTION set to OFF.

tianleiwu · 2023-01-23T23:40:44Z

There are a few builds (like CPU package) with USE_FLASH_ATTENTION off and those builds are good because they will not build the contrib_ops\cuda folder. This error only happens with CUDA build with CUDA < 11.6.

…14404) ### Description Fixes unused `use_memory_efficient_attention` variable in contrib_ops/cuda/bert/attention_impl.cu. ### Motivation and Context ORT with CUDA version < 11.6 fails to build for release configurations due to an unused variable. ```shell c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx runtime_providers_cuda.vcxproj] detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit h T=float]" (923): here ``` This happens for CUDA < 11.6. Our cmake script turns off onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the aforementioned variable unused outside of asserts (which are removed in release builds). The USE_FLASH_ATTENTION option was added by #14343

Fix unused variable in bert/attention_impl.cu for builds with USE_FLA…

ffef05f

…SH_ATTENTION set to OFF.

adrianlizarraga requested a review from tianleiwu January 23, 2023 21:49

adrianlizarraga changed the title ~~Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION set to OFF~~ Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off Jan 23, 2023

adrianlizarraga requested a review from yufenglee January 23, 2023 22:04

tianleiwu approved these changes Jan 23, 2023

View reviewed changes

adrianlizarraga merged commit 85d7e9c into main Jan 24, 2023

adrianlizarraga deleted the adrianl/fix-attention-unused-var branch January 24, 2023 17:32

rui-ren added release:1.14 triage:approved Approved for cherrypicks for release labels Feb 3, 2023

faxu removed the release:1.14 label Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

adrianlizarraga commented Jan 23, 2023

tianleiwu commented Jan 23, 2023

Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off #14404

Conversation

adrianlizarraga commented Jan 23, 2023

Description

Motivation and Context

tianleiwu commented Jan 23, 2023