-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] Add fused op for deepseek #9854
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (10.52%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #9854 +/- ##
===========================================
- Coverage 52.06% 51.77% -0.29%
===========================================
Files 734 738 +4
Lines 116445 117106 +661
===========================================
+ Hits 60624 60629 +5
- Misses 55821 56477 +656 ☔ View full report in Codecov by Sentry. |
@@ -322,6 +322,17 @@ def __init__(self, config: DeepseekV2Config, hidden_size=None, eps=1e-6, use_seq | |||
mark_as_sequence_parallel_parameter(self.weight) | |||
|
|||
def forward(self, hidden_states): | |||
if self.config.use_fused_rms_norm and get_env_device() == "xpu": | |||
if self.weight.dtype != hidden_states.dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处GPU的逻辑是cast到fp32避免误差,这种写法是否会导致xpu误差
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
内部计算使用fp32, 这边精度损失主要是hidden_states cast, 应该还好.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9ac8592
to
6667bbe
Compare
PR types
Performance optimization
PR changes
Others
Description
add xpu fused op for deepseek