ZeRO-Inference refresh #722

tjruwase · 2023-09-08T13:23:45Z

Refresh with 2 new optimizations: weight quantization and kv cache offloading to CPU.
Companion DS PR: microsoft/DeepSpeed#4197

tjruwase · 2023-09-08T13:25:22Z

FYI, @donglinz @cli99

Ying1123 · 2023-09-12T03:50:44Z

@tjruwase It's wonderful to see the optimizations you've integrated into Deepspeed ZeRO-inference, especially without the need for custom APIs. This is a great work!

One minor suggestion is that if you used the ideas (e.g., cache offloading) from FlexGen, maybe it is better also to add FlexGen into the reference section. I'm also happy to discuss if you have any problems regarding other optimizations (Partial offloading, Cache quantization).

tjruwase · 2023-09-12T14:10:02Z

@Ying1123, thanks for the kind words. You are correct that we should add FlexGen to reference since that was our inspiration for exploring cache offloading and weight quantization. Sorry about this oversight.

Is the following the best reference? https://arxiv.org/abs/2303.06865

Ying1123 · 2023-09-12T17:25:24Z

@Ying1123, thanks for the kind words. You are correct that we should add FlexGen to reference since that was our inspiration for exploring cache offloading and weight quantization. Sorry about this oversight.

Is the following the best reference? https://arxiv.org/abs/2303.06865

Yes, this is the most updated paper. Thanks!

* Add zero inference * Fix scripts * Fix scripts * Fix scripts * Fix versioning text * Shrink figure * Shrink figure * Shrink figure * Generality * :q * Tweak repro scripts and README * Fix versions * Fix rqmts * README tweak * Cleanup * Rearrange README * Versioning * cleanup

tjruwase added 13 commits August 11, 2023 11:32

Add zero inference

ddc5140

Fix scripts

da81303

Fix scripts

cc1d641

Fix scripts

f0f1e55

Fix versioning text

ff68369

Shrink figure

cacec7a

Shrink figure

717fdad

Shrink figure

b8e6564

Generality

a353c12

:q

04574c1

Tweak repro scripts and README

2ffd20a

Fix versions

eec5047

Fix rqmts

2423f3d

tjruwase requested review from awan-10 and leonsongmsft September 8, 2023 13:23

tjruwase requested review from jeffra, samyam, ShadenSmith, conglongli, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb and xiaoxiawu-microsoft as code owners September 8, 2023 13:23

tjruwase removed request for arashb, ShadenSmith and jeffra September 8, 2023 13:24

tjruwase removed request for duli2012, samyam, conglongli, mrwyattii, yaozhewei, eltonzheng, minjiaz, RezaYazdaniAminabadi and xiaoxiawu-microsoft September 8, 2023 13:24

README tweak

9974256

tjruwase force-pushed the staging-zero-inference-v1 branch from 9f072a8 to 9974256 Compare September 8, 2023 13:32

Cleanup

09130ab

awan-10 approved these changes Sep 8, 2023

View reviewed changes

Rearrange README

6baa73b

tjruwase force-pushed the staging-zero-inference-v1 branch from 0e51a19 to 6baa73b Compare September 9, 2023 18:50

tjruwase added 3 commits September 9, 2023 14:52

Versioning

d92f714

cleanup

6a7fe36

Merge branch 'master' into staging-zero-inference-v1

b0ac638

tjruwase merged commit 12f78ec into master Sep 11, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeRO-Inference refresh #722

ZeRO-Inference refresh #722

tjruwase commented Sep 8, 2023

tjruwase commented Sep 8, 2023

Ying1123 commented Sep 12, 2023

tjruwase commented Sep 12, 2023

Ying1123 commented Sep 12, 2023

ZeRO-Inference refresh #722

ZeRO-Inference refresh #722

Conversation

tjruwase commented Sep 8, 2023

tjruwase commented Sep 8, 2023

Ying1123 commented Sep 12, 2023

tjruwase commented Sep 12, 2023

Ying1123 commented Sep 12, 2023