Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

Open
dylan-tong-aws opened this issue Oct 12, 2023 · 4 comments

Comments

@dylan-tong-aws
Copy link

What are you proposing?

We’re adding multimodal (text and image) search support to our Neural Search experience. This capability will enable users to add multimodal search capabilities to OpenSearch-powered applications without having to build and manage custom middleware to integrate multimodal models into OpenSearch workflows.

Text and image multimodal search enables users to search on image and text pairs like product catalog items (product image and description) based on visual and semantic similarity. This enables new search experiences which can deliver more relevant results. For instance, users can search for “white blouse” to retrieve product images—the machine learning (ML) model that powers this experience is able to associate semantics and visual characteristics. Unlike traditional methods, there isn’t the requirement to manually manage and index metadata to enable comparable search capabilities. Furthermore, users can also search by image to retrieve visually similar products. Lastly, users can search using both text and image such as finding the products most similar to a particular product catalog item based on semantic and visual similarity.

We want to enable this capability via the Neural Search experience so that OpenSearch users can infuse multimodal search capabilities—like they can for semantic search—into applications with less effort to accelerate their rate of innovation.

Which users have asked for this feature?

This feature was driven by AWS customer demand.

What problems are you trying to solve?

Text and image multimodal search will help our customers improve image search relevancy. Traditional image search is text based search in this disguise. It requires labor to create metadata to describe images, which is a process that is hard to scale due to the speed and cost of labor. Thus, traditional image search performance and freshness can be limited to economics and the ability to maintain high metadata quality.

Multimodal search involves leveraging multimodal embedding models that are trained to understand semantics and visual similarity enabling the aforementioned search experiences without having to produce and maintain image metadata. Furthermore, users can perform visual similarity search. It’s not always easy to describe an image in words. This feature can provide users with the option to match images by visual similarity empowering users to discover more relevant images when visual characteristics are hard to describe.

What is the developer experience going to be?

The developer experience will be the same as the neural search experience for semantic search except we’re adding enhancements to allow users to provide image data via the query, index and ingest processor APIs. Initially, the feature will be powered by an AI connector to Amazon Bedrock’s Multimodal API. New connectors can be added based on user demand and community contributions.

Are there any security considerations?

We’re building this feature on the existing security controls created for semantic search. We’ll support the same granular security controls.

Are there any breaking changes to the API?

No

What is the user experience going to be?

The end user experience will be the same as what we’ve provided for semantic search via the neural search experience. Multimodal search is powered by our vector search engine (k-NN), but users won’t have to run vector search queries. Instead, they can run queries via text, image (binary type) or text and image pairs.

Are there breaking changes to the User Experience?

No.

Why should it be built? Any reason not to?

Refer to the first response to the what/why question.

Any reason why we shouldn’t built this? Some developers will want full flexibility and they might choose to build their multimodal search (vector search) application on our core vector search engine (k-NN). We’ll continue to support users with this option while working on improving our framework so that we provide users with a simpler solution with minimal contraints.

What will it take to execute?

We’ll be enhancing the neural search plugin APIs and creating a new AI connector for Amazon Bedrock multimodal APIs.

Any remaining open questions?

Community feedback is welcome.

@noCharger
Copy link

@dylan-tong-aws do you think this could go into sepecific plugins like ml-commons?

@msfroh
Copy link

msfroh commented Oct 25, 2023

@opensearch-project/admin -- Can we transfer this to the neural-search repo?

@msfroh msfroh removed the untriaged label Oct 25, 2023
@gaiksaya gaiksaya transferred this issue from opensearch-project/OpenSearch Oct 25, 2023
@navneet1v
Copy link
Collaborator

The developer experience will be the same as the neural search experience for semantic search except we’re adding enhancements to allow users to provide image data via the query, index and ingest processor APIs

@dylan-tong-aws can you elaborate this more. I am not sure what we want here.

@vamshin vamshin removed the untriaged label Oct 30, 2023
@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
@martin-gaievski
Copy link
Member

martin-gaievski commented Nov 14, 2024

Added feature of using image as part of the semantic search together with text queries under #359. Is this is meta issue for other enhancements in multimodal or we can close it @dylan-tong-aws ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

6 participants