feat(providers): support non-llama models for inference providers #1200

ashwinb · 2025-02-21T07:12:17Z

This PR begins the process of supporting non-llama models within Llama Stack. We start simple by adding support for this functionality within a few existing providers: fireworks, together and ollama.

Test Plan

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/inference/test_text_inference.py \
  --inference-model accounts/fireworks/models/phi-3-vision-128k-instruct

^ this passes most of the tests but as expected fails the tool calling related tests since they are very specific to Llama models

inference/test_text_inference.py::test_text_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct-completion-01] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet do humans live on?-Earth] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet has rings around it with a name starting w
ith letter S?-Saturn] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What's the name of the Sun in latin?-Sol] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What is the name of the US captial?-Washington] PASSED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_chat_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct] ERROR
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-True] PASSED
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-False] PASSED

tests/client-sdk/conftest.py

t #

terrytangyuan · 2025-02-21T21:02:33Z

llama_stack/providers/utils/inference/model_registry.py

-            if model.metadata.get("llama_model") is None:
-                raise ValueError(
-                    f"Model '{model.provider_resource_id}' is not available and no llama_model was specified in metadata. "
-                    "Please specify a llama_model in metadata or use a supported model identifier"
-                )
+            llama_model = model.metadata.get("llama_model")
+            if llama_model is None:
+                return model
+


I think this change also allows non-Llama models for remote vLLM provider as well, right?

@terrytangyuan indeed it does. this is the key change.

terrytangyuan · 2025-02-22T04:26:08Z

Verified that this works for remote vLLM provider.

ashwinb requested review from yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 21, 2025 07:12

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2025

ashwinb changed the title ~~Support non-llama models for inference providers~~ feat(providers): support non-llama models for inference providers Feb 21, 2025

ashwinb commented Feb 21, 2025

View reviewed changes

tests/client-sdk/conftest.py Show resolved Hide resolved

Support non-llama models for inference providers

1b64573

ashwinb force-pushed the non_llama branch from 6286dfa to 1b64573 Compare February 21, 2025 20:04

Update fixtures

c7e683b

t #

terrytangyuan reviewed Feb 21, 2025

View reviewed changes

hardikjshah approved these changes Feb 21, 2025

View reviewed changes

ashwinb merged commit ab54b8c into main Feb 21, 2025
3 checks passed

ashwinb deleted the non_llama branch February 21, 2025 21:21

terrytangyuan mentioned this pull request Feb 22, 2025

Support non-Llama models #965

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): support non-llama models for inference providers #1200

feat(providers): support non-llama models for inference providers #1200

ashwinb commented Feb 21, 2025 •

edited

Loading

terrytangyuan Feb 21, 2025

ashwinb Feb 21, 2025

terrytangyuan commented Feb 22, 2025

feat(providers): support non-llama models for inference providers #1200

feat(providers): support non-llama models for inference providers #1200

Conversation

ashwinb commented Feb 21, 2025 • edited Loading

Test Plan

terrytangyuan Feb 21, 2025

Choose a reason for hiding this comment

ashwinb Feb 21, 2025

Choose a reason for hiding this comment

terrytangyuan commented Feb 22, 2025

ashwinb commented Feb 21, 2025 •

edited

Loading