Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change post training run.yaml inference config #710

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

SLR722
Copy link
Contributor

@SLR722 SLR722 commented Jan 2, 2025

Context

Colab notebook provides some limited free T4 GPU.

Making post training template e2e works with colab notebook T4 is critical for early adoption of the stack post training apis. However, we found that the existing LlamaModelParallelGenerator (https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L82) in meta-reference inference implementation isn't compatible with T4 machine.

In this PR, We change to disable create_distributed_process_group for inference api in post training run.yaml config and setup up the distributed env variables in notebook
Screenshot 2025-01-02 at 3 48 08 PM

to make meta reference inference compatible with the free T4 machine

test

Test with the WIP post training showcase colab notebook https://colab.research.google.com/drive/1K4Q2wZq232_Bpy2ud4zL9aRxvCWAwyQs?usp=sharing

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 2, 2025
@SLR722 SLR722 marked this pull request as ready for review January 3, 2025 04:24
@ashwinb ashwinb merged commit f450a0f into main Jan 3, 2025
2 checks passed
@ashwinb ashwinb deleted the change_post_training_run_config branch January 3, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants