-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Create zero equivalency unit test #2790
Comments
for (4)th item training and comparing the loss curve you can probably use HF Trainer + pytorch example program, so everything is already done for you - e.g. see the example of how I train from scratch opt-1.3b huggingface/transformers#21312 - so just need to add a set of ds_config files with I can help set this one up. It should be very trivial to do. as it's really just adding on of And also using 3 set ups w/o deepspeed - just using DDP, so that would be another baseline for each of the dtypes. to summarize,
|
Actually I forgot I developed a whole tool to do grid search / matrix of options runs:
the last line demonstrates an example of 6 variations it will run using the same base script: fp16/bf16/fp32 vs tf32(on/off) = 3*2 = 6 variations. this may or may not be easier to use - not sure, but we have plenty of working out of the box choices. I think you just need to find a resource allocation for that and we can set up these jobs very quickly. Then tensorboard all the results into the same base directory |
hi teams, any updates? |
Hey folks, is this still an active issue? I'm observing some differences in training between zero2 and zero3 using Llama models with the fixed rotary embedding cache init (#4932 (comment)). |
As you can see from the discussion you linked to there will be no equivalency in that particular case of LLama-2 due to how the buffers are created. I urge you to file an Issue with HF Transformers and ask them to distribute the correct buffers with the model weights and not leave them to be recalculated at model init time. |
We have the scripts to compare the DeepSpeed's results with PyTorch. |
Starting point: #966
Test matrix
@stas00
The text was updated successfully, but these errors were encountered: