-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ort-Genai benchmark run failed at GroupQueryAttention/RunRotaryEmbedding: Exception thrown at 0x00007FFE7576352C (onnxruntime.dll) in model_benchmark.exe: 0xC0000005: Access violation reading location 0x0000018FE4BB8080. #22252
Comments
here's the problematic code: onnxruntime/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc Lines 149 to 157 in d9de054
annotated: if (packed_qkv) {
// Q is an OrtValue declared in the enclosing scope.
OrtValue RotaryQKV;
Tensor::InitOrtValue(element_type, TensorShape({batch_size, num_heads_ + 2 * kv_num_heads_, sequence_length, head_size}), allocator, RotaryQKV);
// Save pointer to Q's data in q_input.
q_input = Q.Get<Tensor>().Data<T>();
k_input = q_input + num_heads_ * sequence_length * head_size;
q_rotary = RotaryQKV.GetMutable<Tensor>()->MutableData<T>();
k_rotary = q_rotary + num_heads_ * sequence_length * head_size;
// Overwrite Q with RotaryQKV (OrtValues contain shared_ptr to contained value).
// Now, q_input is pointing to freed memory.
Q = RotaryQKV;
} later on, when we use onnxruntime/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc Lines 170 to 172 in d9de054
this problem showed up when CPU allocator sharing between sessions was enabled. in that case, the CPU allocator's arena was disabled. I suspect that the default usage of the arena hid this issue. though I debugged into the first branch, this appears to be a problem in both branches: onnxruntime/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc Lines 149 to 168 in d9de054
|
microsoft/onnxruntime-genai#945 should be a workaround to unblock testing. it disables the allocator sharing by default. |
### Description In GQA there was a memory issue which was best described by @edgchen1 [here](#22252 (comment)) > here's the problematic code: > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L149-L157 > > annotated: > > ```c++ > if (packed_qkv) { > // Q is an OrtValue declared in the enclosing scope. > OrtValue RotaryQKV; > Tensor::InitOrtValue(element_type, TensorShape({batch_size, num_heads_ + 2 * kv_num_heads_, sequence_length, head_size}), allocator, RotaryQKV); > // Save pointer to Q's data in q_input. > q_input = Q.Get<Tensor>().Data<T>(); > k_input = q_input + num_heads_ * sequence_length * head_size; > q_rotary = RotaryQKV.GetMutable<Tensor>()->MutableData<T>(); > k_rotary = q_rotary + num_heads_ * sequence_length * head_size; > // Overwrite Q with RotaryQKV (OrtValues contain shared_ptr to contained value). > // Now, q_input is pointing to freed memory. > Q = RotaryQKV; > } > ``` > > later on, when we use `q_input`, there is a read access violation. > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L170-L172 > > this problem showed up when CPU allocator sharing between sessions was enabled. in that case, the CPU allocator's arena was disabled. I suspect that the default usage of the arena hid this issue. > > though I debugged into the first branch, this appears to be a problem in both branches: > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L149-L168 ### Motivation and Context Fixes a crucial bug. The issue was found here #22252
Should we close this since #22290 is now merged? |
### Description In GQA there was a memory issue which was best described by @edgchen1 [here](microsoft#22252 (comment)) > here's the problematic code: > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L149-L157 > > annotated: > > ```c++ > if (packed_qkv) { > // Q is an OrtValue declared in the enclosing scope. > OrtValue RotaryQKV; > Tensor::InitOrtValue(element_type, TensorShape({batch_size, num_heads_ + 2 * kv_num_heads_, sequence_length, head_size}), allocator, RotaryQKV); > // Save pointer to Q's data in q_input. > q_input = Q.Get<Tensor>().Data<T>(); > k_input = q_input + num_heads_ * sequence_length * head_size; > q_rotary = RotaryQKV.GetMutable<Tensor>()->MutableData<T>(); > k_rotary = q_rotary + num_heads_ * sequence_length * head_size; > // Overwrite Q with RotaryQKV (OrtValues contain shared_ptr to contained value). > // Now, q_input is pointing to freed memory. > Q = RotaryQKV; > } > ``` > > later on, when we use `q_input`, there is a read access violation. > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L170-L172 > > this problem showed up when CPU allocator sharing between sessions was enabled. in that case, the CPU allocator's arena was disabled. I suspect that the default usage of the arena hid this issue. > > though I debugged into the first branch, this appears to be a problem in both branches: > > https://github.com/microsoft/onnxruntime/blob/d9de054eb53034e3dc18c298e47c6cc08d5aa884/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc#L149-L168 ### Motivation and Context Fixes a crucial bug. The issue was found here microsoft#22252
Describe the issue
When run onnxruntime-genai model_benchmark -l 128 -g 4 -i C:\example-models\phi2-int4-int8-blklen32-cpu, the program crashes with:
Exception thrown at 0x00007FFE7576352C (onnxruntime.dll) in model_benchmark.exe: 0xC0000005: Access violation reading location 0x0000018FE4BB8080.
To reproduce
build genai https://onnxruntime.ai/docs/genai/howto/build-from-source.html, use local onnxruntime built. Build in Debug for both onnxruntime and onnxruntime-genai for easier investigation.
Run model_benchmark -l 128 -g 4 -i C:\example-models\phi2-int4-int8-blklen32-cp:
got:
Urgency
yes
Platform
Windows
OS Version
2022
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
commit 7880342
ONNX Runtime API
C
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: