[XPU] Add fused op for deepseek #9854

QingshuChen · 2025-02-12T10:02:40Z

PR types

Performance optimization

PR changes

Others

Description

add xpu fused op for deepseek

paddle-bot · 2025-02-12T10:02:47Z

Thanks for your contribution!

codecov · 2025-02-12T10:37:59Z

Codecov Report

Attention: Patch coverage is 10.52632% with 17 lines in your changes missing coverage. Please review.

Project coverage is 51.77%. Comparing base (763c59a) to head (6667bbe).
Report is 7 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/deepseek_v2/modeling.py	5.55%	17 Missing ⚠️

❌ Your patch status has failed because the patch coverage (10.52%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (51.77%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9854      +/-   ##
===========================================
- Coverage    52.06%   51.77%   -0.29%     
===========================================
  Files          734      738       +4     
  Lines       116445   117106     +661     
===========================================
+ Hits         60624    60629       +5     
- Misses       55821    56477     +656

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DrownFish19 · 2025-02-13T05:42:51Z

paddlenlp/transformers/deepseek_v2/modeling.py

@@ -322,6 +322,17 @@ def __init__(self, config: DeepseekV2Config, hidden_size=None, eps=1e-6, use_seq
            mark_as_sequence_parallel_parameter(self.weight)

    def forward(self, hidden_states):
+        if self.config.use_fused_rms_norm and get_env_device() == "xpu":
+            if self.weight.dtype != hidden_states.dtype:


此处GPU的逻辑是cast到fp32避免误差，这种写法是否会导致xpu误差

内部计算使用fp32, 这边精度损失主要是hidden_states cast, 应该还好.

DrownFish19

LGTM

paddle-bot bot added the contributor label Feb 12, 2025

DrownFish19 changed the title ~~add xpu fused op for deepseek~~ [XPU] Add fused op for deepseek Feb 13, 2025

DrownFish19 reviewed Feb 13, 2025

View reviewed changes

DrownFish19 previously approved these changes Feb 13, 2025

View reviewed changes

add xpu fused op for deepseek

6667bbe

QingshuChen dismissed DrownFish19’s stale review via 6667bbe February 13, 2025 05:54

QingshuChen force-pushed the p800_deepseek branch from 9ac8592 to 6667bbe Compare February 13, 2025 05:54

DrownFish19 approved these changes Feb 14, 2025

View reviewed changes

ZHUI merged commit 5ebe42b into PaddlePaddle:develop Feb 14, 2025
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Add fused op for deepseek #9854

[XPU] Add fused op for deepseek #9854

QingshuChen commented Feb 12, 2025

paddle-bot bot commented Feb 12, 2025

codecov bot commented Feb 12, 2025 •

edited

Loading

DrownFish19 Feb 13, 2025

QingshuChen Feb 13, 2025

DrownFish19 left a comment

[XPU] Add fused op for deepseek #9854

[XPU] Add fused op for deepseek #9854

Conversation

QingshuChen commented Feb 12, 2025

PR types

PR changes

Description

paddle-bot bot commented Feb 12, 2025

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

DrownFish19 Feb 13, 2025

Choose a reason for hiding this comment

QingshuChen Feb 13, 2025

Choose a reason for hiding this comment

DrownFish19 left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 12, 2025 •

edited

Loading