Adam vs AdamW - request to simplify the configuration #668

stas00 · 2021-01-14T21:08:03Z

As a follow up to: #666

It's clear that AdamW is supported since this is actually the default, unless I set:

    "optimizer": {
        "type": "Adam",
        "params": {
            "adam_w_mode": false,

though the user has no idea it's the case.

Why then do I have to set:

   "zero_allow_untested_optimizer": true,

If I want to use:

    "optimizer": {
        "type": "AdamW",

Why not make it simple for the user and just let them specify the exact optimizer they want? Moreover, it's not just switching:

            "adam_w_mode": false,

to true/false, but also needing to adjust weight decay, so it'd be much simpler to have separate full section for Adam and AdamW with the recommended defaults of all the other params.

Otherwise it's more error-prone if you see what I mean (changing one param, but not the others)

Currently, since HF Trainer uses AdamW by default, the config I used is:

   "zero_allow_untested_optimizer": true,
   "optimizer": {
     "type": "AdamW",
     "params": {
       "lr": 3e-5,
       "betas": [
         0.8,
         0.999
       ],
       "eps": 1e-8,
       "weight_decay": 3e-7
     }
   },

I'm also not sure why the user can't set explicitly DeepSpeedCPUAdam so it's loud and clear that this is what they want to use and not rely on magic combinations of:

            # zero-offload  torch-adam  adam_w_mode optimizer
            # T|F           T           T           torch.optim.AdamW
            # T|F           T           F           torch.optim.Adam
            # T             F           T|F         DeepSpeedCPUAdam(adam_w_mode)
            # F             F           T|F         FusedAdam(adam_w_mode)

The behind-the-scenes magic is probably great for general use, but there is a lot of power in knowing exactly what you're using and not second-guessing yourself. The runtime log does help to see which optimizer was magically selected out of these 4.

This secondary issue is not a strong need, since I have the log to tell me what optimizer was picked for me. Just a curiosity of why the explicit selection power is not given to the user.

Thank you.

The text was updated successfully, but these errors were encountered:

tjruwase · 2021-01-15T00:20:53Z

@stas00 Thanks for reporting this issue. To respond to your main question, it is because of an oversight on our part that
"zero_allow_untested_optimizer": true, is currently required for "optimizer": { "type": "AdamW",.

A fix is in the works that makes "AdamW" as an optimizer type.

tjruwase self-assigned this Jan 14, 2021

tjruwase linked a pull request Jan 15, 2021 that will close this issue

Support AdamW optimizer type #670

Merged

tjruwase closed this as completed in #670 Jan 15, 2021

conglongli mentioned this issue Mar 9, 2021

set adamw_mode default true (follows FusedAdam and < 0.3.11 logic) #844

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adam vs AdamW - request to simplify the configuration #668

Adam vs AdamW - request to simplify the configuration #668

stas00 commented Jan 14, 2021 •

edited

Loading

tjruwase commented Jan 15, 2021

Adam vs AdamW - request to simplify the configuration #668

Adam vs AdamW - request to simplify the configuration #668

Comments

stas00 commented Jan 14, 2021 • edited Loading

tjruwase commented Jan 15, 2021

stas00 commented Jan 14, 2021 •

edited

Loading