[CPU] FusedAdam and CPU training support #3991

delock · 2023-07-19T09:14:53Z

This PR implements FusedAdam and also enables CPU training with ZeRO stage 0-3. Note FP16 data type are not supported for CPU training. BF16 is the preferred datatype.

For ZeRO stage 0, this PR depends on #3842.

Whats in this PR:

FusedAdam kernel, the implementation is from CPU Adam kernel and had been modified to suite the case.
Fix bugs in CPU Accelerator interface implementation which had been exposed by training workload.
Handling of synchronized accelerator in DeepSpeed runtime.

This PR had been verified on CIFAR10 in DeepSpeedExamples, A PR had been submitted to support BF16 data type for CIFAR10 deepspeedai/DeepSpeedExamples#651 . Note for Zero0/1, there is an accuracy bug in DeepSpeed supporting BF16 training (#3979), for Zero2/3, accuracy is correct.

accelerator/cpu_accelerator.py

csrc/cpu/adam/cpu_adam.h

csrc/cpu/adam/simd.h

delock · 2023-07-24T07:05:38Z

@tjruwase note we also have a PR in DeepSpeedExamples to demostrate CIFAR10 training with CPU. The PR also fix test broken for pytorch 2.0 because of API change.
deepspeedai/DeepSpeedExamples#651

delock · 2023-07-25T09:15:21Z

Also added CPUAdamBuilder for CPU accelerator, so HelloDeepSpeed which use cpu offload can run.

delock and others added 11 commits July 13, 2023 05:43

fused adam can build

abdb08e

use cpu adam to implement fused adam

f891636

enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)

c7c2451

remove unused parameters

49f5e41

Merge branch 'master' into gma/fused_adam

a78c626

Merge branch 'master' into gma/fused_adam

7c59a37

fix format error

13bc33a

Remove adam class

6f9d839

Merge branch 'up-master' into gma/fused_adam

a6ef7c0

fix format

618936f

support stage3

10f8b38

delock requested review from jeffra, tjruwase, samyam, mrwyattii, RezaYazdaniAminabadi, cmikeh2, awan-10 and arashb as code owners July 19, 2023 09:14

tjruwase removed request for arashb, cmikeh2 and awan-10 July 19, 2023 10:19

tjruwase reviewed Jul 19, 2023

View reviewed changes

accelerator/cpu_accelerator.py Outdated Show resolved Hide resolved

tjruwase reviewed Jul 19, 2023

View reviewed changes

csrc/cpu/adam/cpu_adam.h Outdated Show resolved Hide resolved

tjruwase reviewed Jul 19, 2023

View reviewed changes

csrc/cpu/adam/simd.h Outdated Show resolved Hide resolved

delock and others added 5 commits July 19, 2023 08:46

reuse simd.h

3c749bd

fix format

20a7d8d

make memory_stat return meaningful dict

d77d67b

Merge branch 'master' into gma/fused_adam

988781b

Merge branch 'up-master' into gma/fused_adam

041028f

delock added 2 commits July 20, 2023 23:30

fix format

4745bdf

Merge branch 'master' into gma/fused_adam

822e048

delock added 5 commits July 24, 2023 22:14

add cpu_adam

4d30d4d

reuse cpu_adam

06a89fc

header cleanup

c58c19a

fix cpu_adam

038d026

fix format, add missing file

43beb57

Merge branch 'master' into gma/fused_adam

009c548

tjruwase approved these changes Jul 25, 2023

View reviewed changes

tjruwase added this pull request to the merge queue Jul 25, 2023

Merged via the queue into deepspeedai:master with commit 0f54063 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] FusedAdam and CPU training support #3991

[CPU] FusedAdam and CPU training support #3991

delock commented Jul 19, 2023

delock commented Jul 24, 2023

delock commented Jul 25, 2023

[CPU] FusedAdam and CPU training support #3991

[CPU] FusedAdam and CPU training support #3991

Conversation

delock commented Jul 19, 2023

delock commented Jul 24, 2023

delock commented Jul 25, 2023