-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MoE models #60
Conversation
And only use larger instance to build dev image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing! Some small comments
bias: bool = True | ||
""" | ||
Include bias terms. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't use bias in OLMoE (like in the dense models) so could consider setting this to False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do set this to false in the actual 1B-7B config I cooked up, I just set the class default to True
here to be consistent with the default for other model config classes.
save_overwrite=True, | ||
metrics_collect_interval=10, | ||
cancel_check_interval=1, | ||
z_loss_multiplier=1e-5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the z-loss for the softmax at the output? OLMoE trained without that if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should try with both.
No description provided.