feat(api): Add options for supporting various embedding models #1192

ashwinb · 2025-02-21T00:32:19Z

We need to support:

asymmetric embedding models (support asymmetric embedding models #934)
truncation policies (support server side truncation policies #933)
varying dimensional output (support for Matryoshka embedding models #932)

Test Plan

$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
   --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$  pytest -s -v -k together test_embeddings.py \
   --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
   --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784

ashwinb · 2025-02-21T00:34:38Z

llama_stack/apis/inference/inference.py

+
+class EmbeddingOptions(BaseModel):
+    dimensions: Optional[int] = None
+    text_truncation: Optional[TextTruncation] = None


no image truncation is provided here, images are naturally pre-processed (which means resized / cropped) depending on the settings of each model typically. when a specific need arises to control this pre-processing, we may add an option here. cc @mattf

@ehhuang default behavior should be to throw. is that communicated clearly by the value "None" or should it be an enum throw (or some other value)?

ashwinb · 2025-02-21T00:35:00Z

llama_stack/apis/inference/inference.py

+    end = "end"
+
+
+class EmbeddingContext(Enum):


is "Context" a good enough name? any better suggestion?

llama_stack/apis/inference/inference.py

ashwinb · 2025-02-21T05:42:30Z

llama_stack/apis/inference/inference.py

@@ -482,11 +506,17 @@ async def embeddings(
        self,
        model_id: str,
        contents: List[InterleavedContent],
+        text_truncation: Optional[TextTruncation] = TextTruncation.none,
+        dimensions: Optional[int] = None,


nitpicking: should this be output_dimensions? dimensionality?

ehhuang

LG, minor comments on naming

llama_stack/apis/inference/inference.py

Hoist options into the method args directly. Make TextTrunction.none be explicit. make text_truncation optional rename

ashwinb requested review from yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 21, 2025 00:32

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2025

This was linked to issues Feb 21, 2025

support server side truncation policies #933

Closed

support asymmetric embedding models #934

Closed

support for Matryoshka embedding models #932

Closed

ashwinb changed the title ~~rfc: Add options for supporting various embedding models~~ feat(api): Add options for supporting various embedding models Feb 21, 2025

ashwinb commented Feb 21, 2025

View reviewed changes

ehhuang reviewed Feb 21, 2025

View reviewed changes

llama_stack/apis/inference/inference.py Show resolved Hide resolved

llama_stack/apis/inference/inference.py Outdated Show resolved Hide resolved

ashwinb mentioned this pull request Feb 21, 2025

feat(providers): add NVIDIA Inference embedding provider and tests #935

Merged

5 tasks

meta-llama deleted a comment from ehhuang Feb 21, 2025

ashwinb commented Feb 21, 2025

View reviewed changes

ehhuang approved these changes Feb 21, 2025

View reviewed changes

llama_stack/apis/inference/inference.py Outdated Show resolved Hide resolved

llama_stack/apis/inference/inference.py Outdated Show resolved Hide resolved

rfc: Add options for supporting various embedding models

e011491

Hoist options into the method args directly. Make TextTrunction.none be explicit. make text_truncation optional rename

ashwinb force-pushed the embed_api branch from 08dbe6b to e011491 Compare February 21, 2025 05:55

ashwinb added 2 commits February 20, 2025 22:21

Update embeddings signatures for all providers

2c1e8b5

Update the router

e9fd837

ashwinb merged commit 81ce39a into main Feb 21, 2025
5 checks passed

ashwinb deleted the embed_api branch February 21, 2025 06:27

mattf mentioned this pull request Feb 21, 2025

support for task_type, output_dimension, text_truncation meta-llama/llama-stack-client-python#162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): Add options for supporting various embedding models #1192

feat(api): Add options for supporting various embedding models #1192

ashwinb commented Feb 21, 2025 •

edited

Loading

ashwinb Feb 21, 2025

ashwinb Feb 21, 2025 •

edited

Loading

ashwinb Feb 21, 2025

ashwinb Feb 21, 2025

ehhuang left a comment

feat(api): Add options for supporting various embedding models #1192

feat(api): Add options for supporting various embedding models #1192

Conversation

ashwinb commented Feb 21, 2025 • edited Loading

Test Plan

ashwinb Feb 21, 2025

Choose a reason for hiding this comment

ashwinb Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

ashwinb Feb 21, 2025

Choose a reason for hiding this comment

ashwinb Feb 21, 2025

Choose a reason for hiding this comment

ehhuang left a comment

Choose a reason for hiding this comment

ashwinb commented Feb 21, 2025 •

edited

Loading

ashwinb Feb 21, 2025 •

edited

Loading