Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate 340b params #454

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

malay-nagda
Copy link
Contributor

No description provided.

Signed-off-by: Malay Nagda <[email protected]>
@ericharper ericharper requested a review from dimapihtar January 6, 2025 21:58
@@ -135,7 +135,6 @@ model:
defer_embedding_wgrad_compute: True
wgrad_deferral_limit: 22
cross_entropy_loss_fusion: True
enable_vboost: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is enable_vboost defined if this is removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined in launcher_scripts/conf/config.yaml...

the one defined above is under model key which is redundant as vboost is used outside the scope of model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main issue was with duplicated ub_tp_comm_overlap key which caused jobs to crash

Copy link
Collaborator

@erhoo82 erhoo82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@erhoo82 erhoo82 merged commit 855b40e into NVIDIA:main Jan 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants