[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

dylan-tong-aws · 2023-10-12T02:07:28Z

What are you proposing?

We’re adding multimodal (text and image) search support to our Neural Search experience. This capability will enable users to add multimodal search capabilities to OpenSearch-powered applications without having to build and manage custom middleware to integrate multimodal models into OpenSearch workflows.

Text and image multimodal search enables users to search on image and text pairs like product catalog items (product image and description) based on visual and semantic similarity. This enables new search experiences which can deliver more relevant results. For instance, users can search for “white blouse” to retrieve product images—the machine learning (ML) model that powers this experience is able to associate semantics and visual characteristics. Unlike traditional methods, there isn’t the requirement to manually manage and index metadata to enable comparable search capabilities. Furthermore, users can also search by image to retrieve visually similar products. Lastly, users can search using both text and image such as finding the products most similar to a particular product catalog item based on semantic and visual similarity.

We want to enable this capability via the Neural Search experience so that OpenSearch users can infuse multimodal search capabilities—like they can for semantic search—into applications with less effort to accelerate their rate of innovation.

Which users have asked for this feature?

This feature was driven by AWS customer demand.

What problems are you trying to solve?

Text and image multimodal search will help our customers improve image search relevancy. Traditional image search is text based search in this disguise. It requires labor to create metadata to describe images, which is a process that is hard to scale due to the speed and cost of labor. Thus, traditional image search performance and freshness can be limited to economics and the ability to maintain high metadata quality.

Multimodal search involves leveraging multimodal embedding models that are trained to understand semantics and visual similarity enabling the aforementioned search experiences without having to produce and maintain image metadata. Furthermore, users can perform visual similarity search. It’s not always easy to describe an image in words. This feature can provide users with the option to match images by visual similarity empowering users to discover more relevant images when visual characteristics are hard to describe.

What is the developer experience going to be?

The developer experience will be the same as the neural search experience for semantic search except we’re adding enhancements to allow users to provide image data via the query, index and ingest processor APIs. Initially, the feature will be powered by an AI connector to Amazon Bedrock’s Multimodal API. New connectors can be added based on user demand and community contributions.

Are there any security considerations?

We’re building this feature on the existing security controls created for semantic search. We’ll support the same granular security controls.

Are there any breaking changes to the API?

No

What is the user experience going to be?

The end user experience will be the same as what we’ve provided for semantic search via the neural search experience. Multimodal search is powered by our vector search engine (k-NN), but users won’t have to run vector search queries. Instead, they can run queries via text, image (binary type) or text and image pairs.

Are there breaking changes to the User Experience?

No.

Why should it be built? Any reason not to?

Refer to the first response to the what/why question.

Any reason why we shouldn’t built this? Some developers will want full flexibility and they might choose to build their multimodal search (vector search) application on our core vector search engine (k-NN). We’ll continue to support users with this option while working on improving our framework so that we provide users with a simpler solution with minimal contraints.

What will it take to execute?

We’ll be enhancing the neural search plugin APIs and creating a new AI connector for Amazon Bedrock multimodal APIs.

Any remaining open questions?

Community feedback is welcome.

noCharger · 2023-10-13T20:44:22Z

@dylan-tong-aws do you think this could go into sepecific plugins like ml-commons?

msfroh · 2023-10-25T16:11:17Z

@opensearch-project/admin -- Can we transfer this to the neural-search repo?

navneet1v · 2023-10-25T18:18:11Z

The developer experience will be the same as the neural search experience for semantic search except we’re adding enhancements to allow users to provide image data via the query, index and ingest processor APIs

@dylan-tong-aws can you elaborate this more. I am not sure what we want here.

martin-gaievski · 2024-11-14T20:05:38Z

Added feature of using image as part of the semantic search together with text queries under #359. Is this is meta issue for other enhancements in multimodal or we can close it @dylan-tong-aws ?

dylan-tong-aws added enhancement untriaged labels Oct 12, 2023

msfroh removed the untriaged label Oct 25, 2023

gaiksaya transferred this issue from opensearch-project/OpenSearch Oct 25, 2023

github-actions bot added the untriaged label Oct 25, 2023

vamshin removed the untriaged label Oct 30, 2023

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

dylan-tong-aws commented Oct 12, 2023

noCharger commented Oct 13, 2023

msfroh commented Oct 25, 2023

navneet1v commented Oct 25, 2023

martin-gaievski commented Nov 14, 2024 •

edited

Loading

[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

[FEATURE] OpenSearch Feature Brief - Multimodal Search Support for Neural Search #473

Comments

dylan-tong-aws commented Oct 12, 2023

noCharger commented Oct 13, 2023

msfroh commented Oct 25, 2023

navneet1v commented Oct 25, 2023

martin-gaievski commented Nov 14, 2024 • edited Loading

martin-gaievski commented Nov 14, 2024 •

edited

Loading