Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPU] Add fused op for deepseek #9854

Merged
merged 1 commit into from
Feb 14, 2025
Merged

Conversation

QingshuChen
Copy link
Contributor

PR types

Performance optimization

PR changes

Others

Description

add xpu fused op for deepseek

Copy link

paddle-bot bot commented Feb 12, 2025

Thanks for your contribution!

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 10.52632% with 17 lines in your changes missing coverage. Please review.

Project coverage is 51.77%. Comparing base (763c59a) to head (6667bbe).
Report is 7 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/deepseek_v2/modeling.py 5.55% 17 Missing ⚠️

❌ Your patch status has failed because the patch coverage (10.52%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (51.77%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9854      +/-   ##
===========================================
- Coverage    52.06%   51.77%   -0.29%     
===========================================
  Files          734      738       +4     
  Lines       116445   117106     +661     
===========================================
+ Hits         60624    60629       +5     
- Misses       55821    56477     +656     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DrownFish19 DrownFish19 changed the title add xpu fused op for deepseek [XPU] Add fused op for deepseek Feb 13, 2025
@@ -322,6 +322,17 @@ def __init__(self, config: DeepseekV2Config, hidden_size=None, eps=1e-6, use_seq
mark_as_sequence_parallel_parameter(self.weight)

def forward(self, hidden_states):
if self.config.use_fused_rms_norm and get_env_device() == "xpu":
if self.weight.dtype != hidden_states.dtype:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处GPU的逻辑是cast到fp32避免误差,这种写法是否会导致xpu误差

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内部计算使用fp32, 这边精度损失主要是hidden_states cast, 应该还好.

DrownFish19
DrownFish19 previously approved these changes Feb 13, 2025
Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit 5ebe42b into PaddlePaddle:develop Feb 14, 2025
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants