Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal Embedding Microservice #555

Closed
wants to merge 8 commits into from
Closed

Conversation

tileintel
Copy link
Contributor

@tileintel tileintel commented Aug 22, 2024

Description

This PR introduces multimodal embedding microservice using BridgeTower model as embedding model. This microservice is required for Multimodal RAG on Videos application

  • We have added several dataclasses into comps/cores/proto/docarray.py that are required for the proposed microservices.
  • We have provided a custom implementation of BridgeTower from the one on Huggingface, allowing to compute the embedding of text and the joint embedding of image-text pair.
  • We have employed BridgeTower model for Multimodal Embedding Inference Endpoint (MMEI) running on both CPU and HPU.
  • We have implemented multimodal embedding microservice with Local Multimodal Embedding Model (Local BridgeTower running on CPU)
  • We have implemented multimodal embedding microservice with with MMEI_EMBEDDING_ENDPOINT.
  • We have provided README file and tests.

Issues

RFC: https://github.com/opea-project/docs/pull/49/files
Issue: opea-project/GenAIExamples#358

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

docarray[full]
fastapi
huggingface_hub
langchain
langsmith
opentelemetry-api
opentelemetry-exporter-otlp
opentelemetry-sdk
prometheus-fastapi-instrumentator
transformers
shortuuid
uvicorn
torch
torchvision
pydantic==2.8.2
BridgeTower

Tests

We have provided 2 tests for this microservice:

  • tests/test_multimodal_embeddings_langchain_cpu.sh: This is to test microservice with MMEI running on CPU.
  • tests/test_multimodal_embeddings_langchain_hpu.sh: This is to test microservice with MMEI running on CPU.

@tileintel tileintel mentioned this pull request Aug 29, 2024
3 tasks
@hshen14 hshen14 requested review from XuhuiRen and lkk12014402 and removed request for XuhuiRen August 29, 2024 01:45
@kevinintel kevinintel added this to the v1.0 milestone Aug 29, 2024
Copy link

codecov bot commented Aug 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
comps/cores/proto/docarray.py 99.09% <100.00%> (+0.08%) ⬆️

... and 1 file with indirect coverage changes

Signed-off-by: Tiep Le <[email protected]>
@tileintel
Copy link
Contributor Author

Hi @lvliang-intel. Thank you for your feedback about langsmith. We have removed langsmith from all places that you pointed out above. Would you please re-review this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend to change the name of this folder. This title is to big to cover the multimodal functionality. Especially the use case for this code a pretty small domain where a text must paired with a image to obtain the embedding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. We would like to discuss to keep the name of this folder as it is because of the followings:

  1. Currently, it supports not only image-text pairs, but also support text as well. (c.f. please see method embed_query and embed_documents which inherit from interface Embedding from langchain
  2. This BridgeTowerEmbedding can be extended to embed image (and embed other modalities) in future as well. In our current implementation, we haven't provided such methods because this will not be employed for our proposed application Multimodal RAG on Videos.
  3. Although BridgeTowerEmbedding provides implemented methods for embed_documents, embed_query, embed_image_text_pairs, but it can also be considered as an interface for MultimodalEmbedding for other modalities for future development.
    We would appreciate if you can take into account these comments and consider again. Please let us know if any of these does not make sense and you insist in changing the folder name?
    Thanks a lot

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this PR is merged into 575, please close this PR

@tileintel
Copy link
Contributor Author

tileintel commented Aug 29, 2024

This PR #555 is merged to the PR #575. Closing this one.
Thanks @XuhuiRen and @lvliang-intel

@tileintel tileintel closed this Aug 29, 2024
lkk12014402 pushed a commit that referenced this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants