fix: Update Qdrant support post-refactor #1022

jwm4 · 2025-02-09T22:13:03Z

What does this PR do?

I tried running the Qdrant provider and found some bugs. See #1021 for details. @terrytangyuan wrote there:

Please feel free to submit your changes in a PR. I fixed similar issues for pgvector provider. This might be an issue introduced from a refactoring.

So I am submitting this PR.

Closes #1021

Test Plan

Here are the highlights for what I did to test this:

References:

Install and run Qdrant server:

podman pull qdrant/qdrant
mkdir qdrant-data
podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant

Install and run Llama Stack from the venv-support PR (mainly because I didn't want to install conda):

brew install cmake # Should just need this once

git clone https://github.com/meta-llama/llama-models.git
gh repo clone cdoern/llama-stack
cd llama-stack
gh pr checkout 1018 # This is the checkout that introduces venv support for build/run.  Otherwise you have to use conda.  Eventually this wil be part of main, hopefully.

uv sync --extra dev
uv pip install -e .
source .venv/bin/activate
uv pip install qdrant_client

LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack build --template ollama --image-type venv

edit llama_stack/templates/ollama/run.yaml

in that editor under:

  vector_io:

add:

  - provider_id: qdrant
    provider_type: remote::qdrant
    config: {}

see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/vector_io/qdrant/config.py#L14 for config options (but I didn't need any)

LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack run ollama --image-type venv \
   --port $LLAMA_STACK_PORT \
   --env INFERENCE_MODEL=$INFERENCE_MODEL \
   --env SAFETY_MODEL=$SAFETY_MODEL \
   --env OLLAMA_URL=$OLLAMA_URL

Then I tested it out in a notebook. Key highlights included:

qdrant_provider = None
for provider in client.providers.list():
    if provider.api == "vector_io" and provider.provider_id == "qdrant":
        qdrant_provider = provider
qdrant_provider
assert qdrant_provider is not None, "QDrant is not a provider.  You need to edit the run yaml file you use in your `llama stack run` call"

vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id=qdrant_provider.provider_id,
)

Other than that, I just followed what was in https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

It would be good to have automated tests for this in the future, but that would be a big undertaking.

Signed-off-by: Bill Murdock <[email protected]>

terrytangyuan

Thank you! Can you run tests/vector_io/test_vector_io.py and paste the results? See the top of the file for an example command (you'll need to modify the provider name and env var though).

jwm4 · 2025-02-10T02:25:37Z

I am looking at https://github.com/jwm4/llama-stack/blob/main/tests/client-sdk/vector_io/test_vector_io.py and I don't see a command at the top of the file. However, this seems to work:

I replace all occurrences of "faiss" in "tests/client-sdk/vector_io/test_vector_io.py" with "qdrant". I still have Qdrant running and I relaunch the Llama Stack server. Then I run:

LLAMA_STACK_BASE_URL=http://localhost:5001 INFERENCE_MODEL=llama3.2:3b-instruct-fp16 pytest tests/client-sdk/vector_io/test_vector_io.py

And I get:

/Users/bmurdock/llamastack/cdoern-venv-supprt/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================================================================================================================== test session starts ==========================================================================================================================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/bmurdock/llamastack/cdoern-venv-supprt/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 4 items

tests/client-sdk/vector_io/test_vector_io.py ....                                                                                                                                                                                                                                                 [100%]

===================================================================================================================================== 4 passed, 1 warning in 0.06s ======================================================================================================================================

Note the 4 passed, 1 warning in 0.06s in the end line.

terrytangyuan · 2025-02-10T02:45:21Z

Thanks! Sorry I meant this file: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/tests/vector_io/test_vector_io.py

jwm4 · 2025-02-10T13:59:19Z

Ah, I had not seen there was another test directory! Anyway, I run

uv pip install pytest_html
pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" --env EMBEDDING_DIMENSION=384 -v -s --tb=short --disable-warnings

and I get a bunch of errors of the form:

fixture 'vector_io_qdrant' not found

I will work on adding this fixture to llama_stack/providers/tests/vector_io/fixtures.py

terrytangyuan · 2025-02-10T14:24:07Z

Sounds good. Thanks!

jwm4 · 2025-02-10T21:41:32Z

I haven't had much success so far. I will keep hacking away at it, but I wanted to post an update here.

I add the following fixture to llama_stack/providers/tests/vector_io/fixtures.py:

@pytest.fixture(scope="session")
def vector_io_qdrant() -> ProviderFixture:
    url = os.getenv("QDRANT_URL")
    if url:
        config = QdrantConfig(url=url)
        provider_type = "remote::qdrant"
    else:
        raise ValueError("QDRANT_URL must be set")
    return ProviderFixture(
        providers=[
            Provider(
                provider_id="qdrant",
                provider_type=provider_type,
                config=config.model_dump(),
            )
        ]
    )

I run and I get the following error:

.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:375: in _async_fixture_wrapper
    result = event_loop.run_until_complete(setup_task)
../../../.local/share/uv/python/cpython-3.13.1-macos-aarch64-none/lib/python3.13/asyncio/base_events.py:720: in run_until_complete
    return future.result()
.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:370: in setup
    res = await func(**_add_kwargs(func, kwargs, event_loop, request))
llama_stack/providers/tests/vector_io/fixtures.py:149: in vector_io_stack
    test_stack = await construct_stack_for_test(
llama_stack/providers/tests/resolver.py:67: in construct_stack_for_test
    impls = await construct_stack(run_config, get_provider_registry())
llama_stack/distribution/stack.py:202: in construct_stack
    impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry)
llama_stack/distribution/resolver.py:230: in resolve_impls
    impl = await instantiate_provider(
llama_stack/distribution/resolver.py:317: in instantiate_provider
    impl = await fn(*args)
llama_stack/providers/remote/inference/bedrock/__init__.py:14: in get_adapter_impl
    impl = BedrockInferenceAdapter(config)
llama_stack/providers/remote/inference/bedrock/bedrock.py:71: in __init__
    self._client = create_bedrock_client(config)
llama_stack/providers/utils/bedrock/client.py:72: in create_bedrock_client
    .refreshable_session()
llama_stack/providers/utils/bedrock/refreshable_boto_session.py:103: in refreshable_session
    metadata=self.__get_session_credentials(),
llama_stack/providers/utils/bedrock/refreshable_boto_session.py:85: in __get_session_credentials
    session_credentials = session.get_credentials().get_frozen_credentials()
E   AttributeError: 'NoneType' object has no attribute 'get_frozen_credentials'

So why do I get an error in Bedrock when the other vector DB tests do not? That seems to be because DEFAULT_PROVIDER_COMBINATIONS in conftest.py has:

    pytest.param(
        {
            "inference": "bedrock",
            "vector_io": "qdrant",
        },
        id="qdrant",
        marks=pytest.mark.qdrant,
    ),

None of the other vector DBs are paired with Bedrock. FWIW, I tried replacing this with ollama or sentence_transformers or fireworks which are used by other vector DBs and none of those worked either. ollama fails with:

ValueError: Model 'all-minilm:l6-v2' is not available in Ollama. Available models:

sentence_transformers fails with:

fixture 'inference_sentence_transformers' not found

and fireworks fails with:

E   Missing FIREWORKS_API_KEY in environment. Please set it using one of these methods:
E   1. Export in shell: export FIREWORKS_API_KEY=your-key
E   2. Create .env file in project root with: FIREWORKS_API_KEY=your-key
E   3. Pass directly to pytest: pytest --env FIREWORKS_API_KEY=your-key

I think I will try adding a fixture for sentence_transformers since it seems the closest to being viable, but I really don't understand the fundamentals of what these tests are doing (e.g., why were these pairs of inference and vector DBs chosen?). So I feel like I may be just thrashing around here.

jwm4 · 2025-02-10T23:04:57Z

OK, I couldn't get anywhere with inference_sentence_transformers -- I added the fixture but then it just fails with ValueError: Provider 'inline::sentence_transformers' is not available for API 'Api.inference' which makes sense -- why would we expect sentence_transformers to do inference anyway? So I don't understand what the existing sentence_transformers/chroma is intended to do. So I went back to ollama as the inference provider and tried:

ollama pull all-minilm:l6-v2
curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}'

Now I get:

ValueError: Model 'all-minilm:l6-v2' is not available in Ollama. Available models: llama3.2:3b-instruct-fp16, all-minilm:latest

This seems like a defect since all-minilm:l6-v2 and all-minilm:latest seem to be synonyms. But I can work around it by running:

EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings

That gives me:

/Users/bmurdock/llamastack/cdoern-venv-supprt/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
==================================================================================================================================================================== test session starts ====================================================================================================================================================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/bmurdock/llamastack/cdoern-venv-supprt/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/bmurdock/llamastack/cdoern-venv-supprt/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 18 items / 15 deselected / 3 selected

llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_list[-qdrant] key_fixture_dict[key] inference_ollama
key_fixture_dict[key] vector_io_qdrant
*** providers: {'inference': [Provider(provider_id='ollama', provider_type='remote::ollama', config={'url': 'http://localhost:11434'})], 'vector_io': [Provider(provider_id='qdrant', provider_type='remote::qdrant', config={'location': None, 'url': 'http://localhost', 'port': 6333, 'grpc_port': 6334, 'prefer_grpc': False, 'https': None, 'api_key': None, 'prefix': None, 'timeout': None, 'host': None, 'path': None})]}
*** provider_data: {}
PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_register[-qdrant] PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_query_documents[-qdrant] The scores are: [0.25060147, 0.23422563, 0.22863364]
PASSED

======================================================================================================================================================= 3 passed, 15 deselected, 2 warnings in 1.02s ========================================================================================================================================================

So 3 tests pass and 15 deselected. @terrytangyuan , is that's what is intended here?

terrytangyuan

This should be good for now. Thanks! Feel free to start a separate issue to track issues with running the tests.

@terrytangyuan

# What does this PR do? I tried running the Qdrant provider and found some bugs. See meta-llama#1021 for details. @terrytangyuan wrote there: > Please feel free to submit your changes in a PR. I fixed similar issues for pgvector provider. This might be an issue introduced from a refactoring. So I am submitting this PR. Closes meta-llama#1021 ## Test Plan Here are the highlights for what I did to test this: References: - https://llama-stack.readthedocs.io/en/latest/getting_started/index.html - https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py - https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md#build-configure-and-run-llama-stack Install and run Qdrant server: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` Install and run Llama Stack from the venv-support PR (mainly because I didn't want to install conda): ``` brew install cmake # Should just need this once git clone https://github.com/meta-llama/llama-models.git gh repo clone cdoern/llama-stack cd llama-stack gh pr checkout 1018 # This is the checkout that introduces venv support for build/run. Otherwise you have to use conda. Eventually this wil be part of main, hopefully. uv sync --extra dev uv pip install -e . source .venv/bin/activate uv pip install qdrant_client LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack build --template ollama --image-type venv ``` ``` edit llama_stack/templates/ollama/run.yaml ``` in that editor under: ``` vector_io: ``` add: ``` - provider_id: qdrant provider_type: remote::qdrant config: {} ``` see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/vector_io/qdrant/config.py#L14 for config options (but I didn't need any) ``` LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack run ollama --image-type venv \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env SAFETY_MODEL=$SAFETY_MODEL \ --env OLLAMA_URL=$OLLAMA_URL ``` Then I tested it out in a notebook. Key highlights included: ``` qdrant_provider = None for provider in client.providers.list(): if provider.api == "vector_io" and provider.provider_id == "qdrant": qdrant_provider = provider qdrant_provider assert qdrant_provider is not None, "QDrant is not a provider. You need to edit the run yaml file you use in your `llama stack run` call" vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, provider_id=qdrant_provider.provider_id, ) ``` Other than that, I just followed what was in https://llama-stack.readthedocs.io/en/latest/getting_started/index.html It would be good to have automated tests for this in the future, but that would be a big undertaking. Signed-off-by: Bill Murdock <[email protected]>

@terrytangyuan

# What does this PR do? This is a follow on to #1022 . It includes the changes I needed to be able to test the Qdrant support as requested by @terrytangyuan . I uncovered a lot of bigger, more systemic issues with the vector DB testing and I will open a new issue for those. For now, I am just delivering the work I already did on that. ## Test Plan As discussed on #1022: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` ``` ollama pull all-minilm:l6-v2 curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}' ``` ``` EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings ``` These show 3 tests passing and 15 deselected which is presumably working as intended. --------- Signed-off-by: Bill Murdock <[email protected]>

@terrytangyuan

# What does this PR do? This is a follow on to meta-llama#1022 . It includes the changes I needed to be able to test the Qdrant support as requested by @terrytangyuan . I uncovered a lot of bigger, more systemic issues with the vector DB testing and I will open a new issue for those. For now, I am just delivering the work I already did on that. ## Test Plan As discussed on meta-llama#1022: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` ``` ollama pull all-minilm:l6-v2 curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}' ``` ``` EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings ``` These show 3 tests passing and 15 deselected which is presumably working as intended. --------- Signed-off-by: Bill Murdock <[email protected]>

fix: Update Qdrant support post-refactor

eeae5f6

Signed-off-by: Bill Murdock <[email protected]>

jwm4 requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 9, 2025 22:13

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 9, 2025

jwm4 mentioned this pull request Feb 9, 2025

feat: add support for running in a venv #1018

Merged

terrytangyuan reviewed Feb 9, 2025

View reviewed changes

terrytangyuan approved these changes Feb 10, 2025

View reviewed changes

terrytangyuan merged commit 3856927 into meta-llama:main Feb 10, 2025
6 checks passed

jwm4 mentioned this pull request Feb 11, 2025

test: Add qdrant to provider tests #1039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update Qdrant support post-refactor #1022

fix: Update Qdrant support post-refactor #1022

jwm4 commented Feb 9, 2025 •

edited

Loading

terrytangyuan left a comment

jwm4 commented Feb 10, 2025

terrytangyuan commented Feb 10, 2025

jwm4 commented Feb 10, 2025

terrytangyuan commented Feb 10, 2025

jwm4 commented Feb 10, 2025

jwm4 commented Feb 10, 2025

terrytangyuan left a comment

fix: Update Qdrant support post-refactor #1022

fix: Update Qdrant support post-refactor #1022

Conversation

jwm4 commented Feb 9, 2025 • edited Loading

What does this PR do?

Test Plan

terrytangyuan left a comment

Choose a reason for hiding this comment

jwm4 commented Feb 10, 2025

terrytangyuan commented Feb 10, 2025

jwm4 commented Feb 10, 2025

terrytangyuan commented Feb 10, 2025

jwm4 commented Feb 10, 2025

jwm4 commented Feb 10, 2025

terrytangyuan left a comment

Choose a reason for hiding this comment

jwm4 commented Feb 9, 2025 •

edited

Loading