You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why not make it simple for the user and just let them specify the exact optimizer they want? Moreover, it's not just switching:
"adam_w_mode": false,
to true/false, but also needing to adjust weight decay, so it'd be much simpler to have separate full section for Adam and AdamW with the recommended defaults of all the other params.
Otherwise it's more error-prone if you see what I mean (changing one param, but not the others)
Currently, since HF Trainer uses AdamW by default, the config I used is:
I'm also not sure why the user can't set explicitly DeepSpeedCPUAdam so it's loud and clear that this is what they want to use and not rely on magic combinations of:
# zero-offload torch-adam adam_w_mode optimizer
# T|F T T torch.optim.AdamW
# T|F T F torch.optim.Adam
# T F T|F DeepSpeedCPUAdam(adam_w_mode)
# F F T|F FusedAdam(adam_w_mode)
The behind-the-scenes magic is probably great for general use, but there is a lot of power in knowing exactly what you're using and not second-guessing yourself. The runtime log does help to see which optimizer was magically selected out of these 4.
This secondary issue is not a strong need, since I have the log to tell me what optimizer was picked for me. Just a curiosity of why the explicit selection power is not given to the user.
Thank you.
The text was updated successfully, but these errors were encountered:
@stas00 Thanks for reporting this issue. To respond to your main question, it is because of an oversight on our part that "zero_allow_untested_optimizer": true, is currently required for "optimizer": { "type": "AdamW",.
A fix is in the works that makes "AdamW" as an optimizer type.
As a follow up to: #666
It's clear that
AdamW
is supported since this is actually the default, unless I set:though the user has no idea it's the case.
Why then do I have to set:
If I want to use:
Why not make it simple for the user and just let them specify the exact optimizer they want? Moreover, it's not just switching:
to true/false, but also needing to adjust weight decay, so it'd be much simpler to have separate full section for Adam and AdamW with the recommended defaults of all the other params.
Otherwise it's more error-prone if you see what I mean (changing one param, but not the others)
Currently, since HF Trainer uses AdamW by default, the config I used is:
I'm also not sure why the user can't set explicitly
DeepSpeedCPUAdam
so it's loud and clear that this is what they want to use and not rely on magic combinations of:The behind-the-scenes magic is probably great for general use, but there is a lot of power in knowing exactly what you're using and not second-guessing yourself. The runtime log does help to see which optimizer was magically selected out of these 4.
This secondary issue is not a strong need, since I have the log to tell me what optimizer was picked for me. Just a curiosity of why the explicit selection power is not given to the user.
Thank you.
The text was updated successfully, but these errors were encountered: