Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Examples] Update nemo gpt examples #3743

Merged
merged 5 commits into from
Jul 18, 2024
Merged

Conversation

romilbhardwaj
Copy link
Collaborator

Updates NeMo GPT examples.

  • Use nvcr nemo container to make startup faster (~5 min to pull and start vs 1.5 hr of setup in the previous YAML)
  • Uses newer McoreGPTModel instead of the deprecated GPTModel
  • Added notes and customization on checkpointing destination
  • Fixed some dependency issues

Tested on Kubernetes and GCP.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the nemo example @romilbhardwaj! Looks good to me.

resources:
cpus: 8+
memory: 64+
accelerators: A100-80GB:1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a test with multi-node multi-gpu as well, i.e. each node having multiple GPUs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed it works with A100-80GB:4 on 2 nodes!

@romilbhardwaj romilbhardwaj added this pull request to the merge queue Jul 18, 2024
Merged via the queue into master with commit c0246ab Jul 18, 2024
20 checks passed
@romilbhardwaj romilbhardwaj deleted the nemo_gpt_distcheckpoint branch July 18, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants