Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using explicit GPU upcast for ZeRO-Offload #6962

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

xylian86
Copy link
Contributor

@xylian86 xylian86 commented Jan 20, 2025

Following discussion in PR-6670, the explict upcast is much more efficient than implicit upcast, this PR is to replace implicit upcast with explict one.

The results on 3B model are shown below:

Option BWD (ms) Speed up
Before PR-6670 25603.30 1x
After PR-6670 1174.31 21.8X
After this PR 309.2 82.8X

@loadams loadams enabled auto-merge January 21, 2025 18:16
@loadams loadams added this pull request to the merge queue Jan 21, 2025
Merged via the queue into deepspeedai:master with commit c17dc33 Jan 21, 2025
13 checks passed
tjruwase pushed a commit that referenced this pull request Feb 6, 2025
Following discussion in
[PR-6670](#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: Olatunji Ruwase <[email protected]>
siqi654321 pushed a commit to siqi654321/DeepSpeed that referenced this pull request Feb 7, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: siqi <[email protected]>
traincheck-team pushed a commit to traincheck-team/DeepSpeed that referenced this pull request Feb 9, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 18, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: gyou2021 <[email protected]>
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 18, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: gyou2021 <[email protected]>
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 28, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: gyou2021 <[email protected]>
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this pull request Mar 6, 2025
Following discussion in
[PR-6670](deepspeedai#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.

The results on 3B model are shown below:

| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |

Signed-off-by: yisheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants