Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MoE models #60

Merged
merged 61 commits into from
Oct 21, 2024
Merged

Add support for MoE models #60

merged 61 commits into from
Oct 21, 2024

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Oct 2, 2024

No description provided.

Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing! Some small comments

Comment on lines +118 to +121
bias: bool = True
"""
Include bias terms.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't use bias in OLMoE (like in the dense models) so could consider setting this to False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do set this to false in the actual 1B-7B config I cooked up, I just set the class default to True here to be consistent with the default for other model config classes.

src/olmo_core/nn/moe/layers.py Outdated Show resolved Hide resolved
save_overwrite=True,
metrics_collect_interval=10,
cancel_check_interval=1,
z_loss_multiplier=1e-5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the z-loss for the softmax at the output? OLMoE trained without that if so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should try with both.

@epwalsh epwalsh marked this pull request as ready for review October 15, 2024 16:09
@epwalsh epwalsh changed the title [WIP] Add support for MoE models Add support for MoE models Oct 15, 2024
@epwalsh epwalsh merged commit 4d3b231 into main Oct 21, 2024
15 checks passed
@epwalsh epwalsh deleted the epwalsh/moe branch October 21, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants