Change post training run.yaml inference config #710
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Colab notebook provides some limited free T4 GPU.
Making post training template e2e works with colab notebook T4 is critical for early adoption of the stack post training apis. However, we found that the existing LlamaModelParallelGenerator (https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L82) in meta-reference inference implementation isn't compatible with T4 machine.
In this PR, We change to disable create_distributed_process_group for inference api in post training run.yaml config and setup up the distributed env variables in notebook

to make meta reference inference compatible with the free T4 machine
test
Test with the WIP post training showcase colab notebook https://colab.research.google.com/drive/1K4Q2wZq232_Bpy2ud4zL9aRxvCWAwyQs?usp=sharing