MP ZeRO++ #3954

HeyangQin · 2023-07-13T19:34:40Z

As a follow-up and extension of the ZeRO++ release, the mixed precision ZeRO++ PR grants users the option to permanently keep the non-trainable weights quantized, which is very useful for LoRA. Compared with the standard weights quantization in ZeRO++, it allows for reduced memory usage and even better throughput. Many thanks to Sam for helping with this implementation.

* fix conv_flops_compute when padding is a str when stride=1 * fix error * change type of paddings to tuple * fix padding calculation * apply formatting check --------- Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

* Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Cheng Li <[email protected]>

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format

Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Ethan Doe <[email protected]> Co-authored-by: yidoe <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: HeyangQin <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: cmikeh2 <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Reza Yazdani <[email protected]>

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * add ZeRO++ Japanese blog * add links --------- Co-authored-by: HeyangQin <[email protected]> Co-authored-by: Conglong Li <[email protected]>

* fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

* Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <[email protected]>

Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

…cision_lora

HeyangQin and others added 30 commits June 21, 2023 11:51

zero++ tutorial PR (#3783)

df1859d

fix interpolate flops compute (#3782)

a8c182a

use Flops Profiler to test model.generate() (#2515)

c4c442f

* Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Cheng Li <[email protected]>

revert PR #3611 (#3786)

fc9e1ee

bump to 0.9.6

40045dc

ZeRO++ chinese blog (#3793)

49a0a1b

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format

remove staging trigger (#3792)

2c62cb4

adding zero++ to navigation panel of deepspeed.ai (#3796)

01b843a

Bug Fixes for autotuner and flops profiler (#1880)

b4a2c0a

* fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <[email protected]>

Missing strided copy for gated MLP (#3788)

b7e1010

Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Requires grad checking. (#3789)

e5b1ead

Co-authored-by: Jeff Rasley <[email protected]>

bump to 0.10.0

9c756cf

Fix Bug in transform.cu (#3534)

a204edc

* Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <[email protected]>

bug fix: triton importing error (#3799)

f6e2e38

Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

Merge branch 'master' of github.com:microsoft/DeepSpeed

c1a7d3c

Merge branch 'master' of github.com:microsoft/DeepSpeed

65ed548

Merge branch 'master' of github.com:microsoft/DeepSpeed

d7ac329

Merge branch 'master' of github.com:microsoft/DeepSpeed

83f1102

Merge branch 'master' of github.com:microsoft/DeepSpeed

16555b2

Merge branch 'master' of github.com:microsoft/DeepSpeed

9d7b654

init commit for mixed precision lora

2efb73d

fix format

1147885

patch _allgather_params & minor fixes

1bec51f

make sure initial quantization are finished

5b3c460

make sure dequantization is finished

ec1f154

skip quantization for small parameters

9d53168

HeyangQin added 10 commits July 13, 2023 19:42

remove unused async_op

b3ad425

Merge branch 'HeyangQin/mixed_precision_lora' of https://github.com/m…

7b2b6a4

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

lazy load of quantizer kernels

a06c564

add mixed precision lora tutorial

94cf3c4

Merge branch 'master' into HeyangQin/mixed_precision_lora

ce96d9a

cleanup mics

b1cb597

cleanup mics

3470949

Merge branch 'HeyangQin/mixed_precision_lora' of https://github.com/m…

e0e8cf4

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

replace get_accelerator().current_device()

c25cf6b

Merge remote-tracking branch 'origin/master' into HeyangQin/mixed_pre…

aa4f28a

…cision_lora

HeyangQin mentioned this pull request Aug 16, 2023

Mixed Precision ZeRO++ deepspeedai/DeepSpeedExamples#689

Merged

HeyangQin added 3 commits August 17, 2023 04:05

Merge remote-tracking branch 'origin/master' into HeyangQin/mixed_pre…

f7cb549

…cision_lora

add kwargs to mics

d501309

fix format

b5a41fa

HeyangQin changed the title ~~Mixed precision LoRA release~~ Mixed precision ZeRO++ release Aug 17, 2023

HeyangQin changed the title ~~Mixed precision ZeRO++ release~~ MP ZeRO++ Aug 17, 2023

HeyangQin added 2 commits August 17, 2023 18:55

seperate code and tutorial

74c2760

Merge branch 'master' into HeyangQin/mixed_precision_lora

9f68cda

awan-10 approved these changes Aug 18, 2023

View reviewed changes

HeyangQin enabled auto-merge August 18, 2023 18:54

awan-10 and others added 4 commits August 18, 2023 16:39

Merge branch 'master' into HeyangQin/mixed_precision_lora

f802011

Merge branch 'master' into HeyangQin/mixed_precision_lora

a6bd454

Merge branch 'master' into HeyangQin/mixed_precision_lora

3d527b2

fix _all_gather in zero3

9e277ba

HeyangQin added this pull request to the merge queue Aug 20, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2023

HeyangQin added this pull request to the merge queue Aug 21, 2023

Merged via the queue into master with commit 7711bdb Aug 21, 2023

jeffra deleted the HeyangQin/mixed_precision_lora branch August 31, 2023 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MP ZeRO++ #3954

MP ZeRO++ #3954

HeyangQin commented Jul 13, 2023 •

edited

Loading

MP ZeRO++ #3954

MP ZeRO++ #3954

Conversation

HeyangQin commented Jul 13, 2023 • edited Loading

HeyangQin commented Jul 13, 2023 •

edited

Loading