-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix attn_bias_add bug. #37147
Fix attn_bias_add bug. #37147
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改动会对性能有影响吗 ?
@@ -11,10 +11,14 @@ limitations under the License. */ | |||
|
|||
#pragma once | |||
|
|||
#include "paddle/fluid/operators/fused/attn_bias_add.cu.h" | |||
// #include "paddle/fluid/operators/fused/attn_bias_add.cu.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议在下个 PR 里删除掉不需要的行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前手动实现的bias_add是模仿已有op的实现写的。现在恢复成直接调用已有op实现,对性能没有影响。
} | ||
} | ||
|
||
void ComputeBackward(const T* input, const T* weight, const T* d_output, | ||
T* d_input, T* d_weight, T* d_bias) { | ||
// void ComputeBackward(const T* input, const T* weight, const T* d_output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议在下个 PR 里删除不需要的行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
LGTM for LaunchElementwiseCudaKernel |
a692a7f
to
c9e3439
Compare
fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
PR types
Bug fixes
PR changes
OPs
Describe
问题描述:
fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
解决:
调用处,由
kernel_primitives::WriteData<OutT, VecSize, 1, 1>(out + fix, result, num);
改成:
kernel_primitives::WriteData<OutT, VecSize, 1, 1, true>(out + fix, result, num);
为了避免此类问题的发生,直接将fused_attention_op中对bias_add和reduce的实现改成直接调用paddle已有op的最外层接口进行实现。
2.修改了fused_attention和fused_feedforward op的api参数的english doc(根据 #36972 中chenlong的review进行修改)。
预览(只展示修改的地方)如下:
functional.fused_multi_head_attention:
nn.FusedMultiHeadAttention: