Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

figuring out how to enable DeepSpeedCPUAdam #666

Closed
stas00 opened this issue Jan 14, 2021 · 2 comments
Closed

figuring out how to enable DeepSpeedCPUAdam #666

stas00 opened this issue Jan 14, 2021 · 2 comments

Comments

@stas00
Copy link
Collaborator

stas00 commented Jan 14, 2021

(oops, it looks like I forgot to hit Enter when I wrote this issue yesterday, so submitting it now)

I'm trying to experiment with DeepSpeedCPUAdam on single gpu

The tutorial is very confusing since it suggests to activate –cpu_optimizer but it's relevant to MegatronLM and not to the general user.

It took me a while to figure out how to configure DeepSpeedCPUAdam, I tried DeepSpeedCPUAdam, CPUAdam and ds would crash saying it doesn't support those optimizers. Eventually looking through the source code I found it that it's automatic if the following conditions are met:

            # zero-offload  torch-adam  adam_w_mode optimizer
            # T|F           T           T           torch.optim.AdamW
            # T|F           T           F           torch.optim.Adam
            # T             F           T|F         DeepSpeedCPUAdam(adam_w_mode)
            # F             F           T|F         FusedAdam(adam_w_mode)

Then in the logging I get:

[2021-01-12 13:52:16,666] [INFO] [engine.py:521:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam (
Parameter Group 0
    amsgrad: False
    betas: [0.9, 0.999]
    bias_correction: True
    eps: 1e-08
    lr: 3e-05
    weight_decay: 3e-07
)
Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[...]
[2021-01-12 13:52:20,588] [INFO] [config.py:709:print]   optimizer_name ............... adam
[2021-01-12 13:52:20,588] [INFO] [config.py:709:print]   optimizer_params ............. {'lr': 3e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 3e-07}

So in one place it says it uses DeepSpeedCPUAdam, and later in the configuration dump it says adam - perhaps it could say: adam (DeepSpeedCPUAdam)?

May I suggest that there should be a general tutorial for ZeRO-Offload and Megatron-LM-specific tutorial?

After all the attempts now I know that to switch to DeepSpeedCPUAdam all I need is to just have the config as explained here https://www.deepspeed.ai/tutorials/zero-offload/#deepspeed-configuration-changes


The Adam vs. AdamW selection by the user isn't documented, after reading the code I added adam_w_mode:

    "optimizer": {
        "type": "Adam",
        "params": {
            "adam_w_mode": true,
            "lr": 3e-5,
            "betas": [ 0.9, 0.999 ],
            "eps": 1e-8,
            "weight_decay": 3e-7
        }
    },

as a place-holder in case I need to switch it to Adam from the default AdamW

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @stas00

Sorry for the troubles you got into for enabling this feature! I agree with you that we should have documented it better. 👍
I can work on adding a section for the ZeRO-offload to explain this better, with explaining all the configurations and parameters.

Thank you.
Reza

@stas00
Copy link
Collaborator Author

stas00 commented Jan 14, 2021

That would be fantastic, Reza! Thank you!

And if the following input is useful: from the user-point of view - decoupling this section/tutorial from Megatron-LM would be very helpful, as the latter is not needed at all unless someone is using Megatron-LM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants