Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semi-weekly 8bit lora zero3 check #1852

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

semi-weekly 8bit lora zero3 check #1852

wants to merge 6 commits into from

Conversation

winglian
Copy link
Collaborator

adds a check on our semi-weekly GHA against 8bit lora with ds zero3. Depends on huggingface/transformers#32943 to be merged.

@winglian
Copy link
Collaborator Author

this also needs this upstream PR huggingface/transformers#32943

@winglian
Copy link
Collaborator Author

this one needs a deeper dive into why the train loss is larger by an order of magnitude

@winglian winglian added the hold don't merge this yet label Nov 19, 2024
@winglian
Copy link
Collaborator Author

winglian commented Dec 8, 2024

I also tried deepspeed 0.16.1, but that is blocked too for gradient accumulation issues, see #2154.

Also trying deepspeed==0.16.1 with grad_accum=1, also results in train/loss of ~13 on L3-3B.
Also tried disabling sample packing, and the loss was still incorrect @ ~13.
Also tried with zero2 and zero1 for 8bit LoRAs, and those were fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold don't merge this yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants