You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(oops, it looks like I forgot to hit Enter when I wrote this issue yesterday, so submitting it now)
I'm trying to experiment with DeepSpeedCPUAdam on single gpu
The tutorial is very confusing since it suggests to activate –cpu_optimizer but it's relevant to MegatronLM and not to the general user.
It took me a while to figure out how to configure DeepSpeedCPUAdam, I tried DeepSpeedCPUAdam, CPUAdam and ds would crash saying it doesn't support those optimizers. Eventually looking through the source code I found it that it's automatic if the following conditions are met:
# zero-offload torch-adam adam_w_mode optimizer
# T|F T T torch.optim.AdamW
# T|F T F torch.optim.Adam
# T F T|F DeepSpeedCPUAdam(adam_w_mode)
# F F T|F FusedAdam(adam_w_mode)
Sorry for the troubles you got into for enabling this feature! I agree with you that we should have documented it better. 👍
I can work on adding a section for the ZeRO-offload to explain this better, with explaining all the configurations and parameters.
And if the following input is useful: from the user-point of view - decoupling this section/tutorial from Megatron-LM would be very helpful, as the latter is not needed at all unless someone is using Megatron-LM.
(oops, it looks like I forgot to hit Enter when I wrote this issue yesterday, so submitting it now)
I'm trying to experiment with DeepSpeedCPUAdam on single gpu
The tutorial is very confusing since it suggests to activate
–cpu_optimizer
but it's relevant to MegatronLM and not to the general user.It took me a while to figure out how to configure
DeepSpeedCPUAdam
, I triedDeepSpeedCPUAdam
,CPUAdam
and ds would crash saying it doesn't support those optimizers. Eventually looking through the source code I found it that it's automatic if the following conditions are met:Then in the logging I get:
So in one place it says it uses
DeepSpeedCPUAdam
, and later in the configuration dump it saysadam
- perhaps it could say:adam (DeepSpeedCPUAdam)
?May I suggest that there should be a general tutorial for ZeRO-Offload and Megatron-LM-specific tutorial?
After all the attempts now I know that to switch to
DeepSpeedCPUAdam
all I need is to just have the config as explained here https://www.deepspeed.ai/tutorials/zero-offload/#deepspeed-configuration-changesThe Adam vs. AdamW selection by the user isn't documented, after reading the code I added
adam_w_mode
:as a place-holder in case I need to switch it to
Adam
from the defaultAdamW
The text was updated successfully, but these errors were encountered: