Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adam vs AdamW - request to simplify the configuration #668

Closed
stas00 opened this issue Jan 14, 2021 · 1 comment · Fixed by #670
Closed

Adam vs AdamW - request to simplify the configuration #668

stas00 opened this issue Jan 14, 2021 · 1 comment · Fixed by #670
Assignees

Comments

@stas00
Copy link
Collaborator

stas00 commented Jan 14, 2021

As a follow up to: #666

It's clear that AdamW is supported since this is actually the default, unless I set:

    "optimizer": {
        "type": "Adam",
        "params": {
            "adam_w_mode": false,

though the user has no idea it's the case.

Why then do I have to set:

   "zero_allow_untested_optimizer": true,

If I want to use:

    "optimizer": {
        "type": "AdamW",

Why not make it simple for the user and just let them specify the exact optimizer they want? Moreover, it's not just switching:

            "adam_w_mode": false,

to true/false, but also needing to adjust weight decay, so it'd be much simpler to have separate full section for Adam and AdamW with the recommended defaults of all the other params.

Otherwise it's more error-prone if you see what I mean (changing one param, but not the others)

Currently, since HF Trainer uses AdamW by default, the config I used is:

   "zero_allow_untested_optimizer": true,
   "optimizer": {
     "type": "AdamW",
     "params": {
       "lr": 3e-5,
       "betas": [
         0.8,
         0.999
       ],
       "eps": 1e-8,
       "weight_decay": 3e-7
     }
   },

I'm also not sure why the user can't set explicitly DeepSpeedCPUAdam so it's loud and clear that this is what they want to use and not rely on magic combinations of:

            # zero-offload  torch-adam  adam_w_mode optimizer
            # T|F           T           T           torch.optim.AdamW
            # T|F           T           F           torch.optim.Adam
            # T             F           T|F         DeepSpeedCPUAdam(adam_w_mode)
            # F             F           T|F         FusedAdam(adam_w_mode)

The behind-the-scenes magic is probably great for general use, but there is a lot of power in knowing exactly what you're using and not second-guessing yourself. The runtime log does help to see which optimizer was magically selected out of these 4.

This secondary issue is not a strong need, since I have the log to tell me what optimizer was picked for me. Just a curiosity of why the explicit selection power is not given to the user.

Thank you.

@tjruwase tjruwase self-assigned this Jan 14, 2021
@tjruwase tjruwase linked a pull request Jan 15, 2021 that will close this issue
@tjruwase
Copy link
Contributor

@stas00 Thanks for reporting this issue. To respond to your main question, it is because of an oversight on our part that
"zero_allow_untested_optimizer": true, is currently required for "optimizer": { "type": "AdamW",.

A fix is in the works that makes "AdamW" as an optimizer type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants