-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Search Semantic Chaining Mechanisms #12
Comments
Can you provide some examples of the problems this would solve at a high level in the summary? Some examples for what is described above the first horizontal line would help in attracting the right people to comment on this. |
In query stage 2.1, it says the user entered "rambo," but "rambo" is not mentioned again. For this comment "// this section is generated for the chain if not given by user," when would the chain be given by the user other than the initial query? How does this all compare to how search works today as opensearch passes through analyzers? "We currently don't support paging in the chaining termination step and therefore this step does not allow paging of the results." Can you provide a reference to what is doing this today? |
I'm a bit lost as I'm picking this back up again. I've now seen multiple examples of chaining in both query rewriting and ranking. So, I've recanted some of my earlier complaints. With that said, I would like to understand where we are in staging the work here, so that we can push items out incrementally. |
I'm not sure what the previous complaints were so I may be missing some context. This is not yet scheduled for development. We're working on the roadmap for search relevance now and could use help from the community in prioritization. One piece of the chain that could be useful sooner rather than later would be to allow the owners of the search application to pass the original user query without any rewriting through to OpenSearch. This could feed logging and inform internal search analytics (top queries, zero results queries, etc.). We think working on that as a first piece along with the remote ranker plug-in would be good progress. Are you considering working on any of this/looking for a breakdown to pick up something? |
I was chatting w/ @mahitamahesh about what this might look like in terms of transforming both requests and results (which I think is the appropriate generalization of rewriters/rerankers), and how we might incorporate an idea of "stored, named chains" to simplify e.g. A/B testing between two chains before making one the index default chain. Here are some example calls that we discussed:
|
Hi @msfroh, search configurations seem like they could be a very useful generalization. I am wondering how general search_configurations would be, or if they are meant to specifically store information for chaining only. Specifically, I am working on opensearch-project/neural-search#70 for the neural search plugin where we want to associate model_id's with fields so that users do not have to pass in the model ids for each search request - rather, the information is associated with the index instead. In other words, I want to store a map like this with the index to be used at search time:
I thought about storing this with the That being said, it seems like a search configuration might be a good place to store a mapping like this and associate it with an index via index setting. Would it make sense to make search configurations extensible to store information other than chains that could be used at different stages throughout search phases? |
This seems to be good extensibility that can be used for other plugins like Neural search. +1 to jack comment. @msfroh can we make it extensible so that it can be used outside this plugin |
Closing as Search Pipelines has gone GA. Thanks, @YANG-DB! |
Relevancy rewriters and rankers mechanism
The purpose of this mechanism is to allow a concise and standard way of defining search relevancy occurring on both
query rewrite side and results ranking
This proposal is the collaboration of the
The capability of chaining multiple search relevancy rewriters and possibly results rerankers would allow the following :
Chain Components
Chain operators
Each chain element is an operator which transforms the query content and send it upstream to the next operator - we will
call them Transformers.
The expectation from a transformer is to have no additional side-effects apart from the query transformation.
Chain payload
The chain's payload is the query itself. Each transformer is expected to transform the query in such a way that is
processable by the next transformer.
Chain termination step
The chain is terminated with a terminal step which is no longer emitting the query to upstream components of the chain.
This termination step is likely an actual execution of the query against the underlying search engine.
Chain footsteps
Once a chain is executing, it leaves a trail for each transformer that is operating in the form of specific train info.
Chain execution
The chain order will be defined as part of the query extension, if such definition is not found under the query
extension, the fallback will be the
specific query's index mapping definition of the rewriter (under the mapping's metadata)
Rewriter Transformations
The chain mechanism is actually a composition of query interceptors. These query interceptors purpose will be of
chaining the individual
query rewriter plugin one to the other in a sequential manner.
Rankers Transformations
The chain mechanism is terminated once a termination step is called. Such termination step is the ranker operator.
The ranker operator takes the query input and performs the actual query against the database and ranks the results
according to its own internal reasoning.
We currently don't support paging in the chaining termination step and therefore this step does not allow paging of
the results.
Configuration
Each transformation/operator may use the next levels of configuration:
Pluging level configuration
This level of configuration is supported by the Plugin API of opensearch and may be used for static related
configuration of the component.
Implementation of this capability can make use of the BaseRestHandler endpoint extension mechanism.
For example querqy uses such endpoint for it's rewrite rules definition:
PUT /_plugins/_querqy/rewriter/common_rules
Index level configuration
This level of configuration is supported by the using the index mapping meta DSL which is an existing part of the
mapping DSL.
Example usage of the index mapping configuration:
New chain mapping DSL
For backwards compatibility we will use the index mapping **_meta **_field to preserve the configuration information
related both to the rewriters and rankers.
The chain parts will reside under the generic concepts:
** - rankers - **ranker list of plugins configuration
** - rewriters - **rewriter list of plugins configuration
Metadata under my_index/_mapping
The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears
under the query chain-directive )
Query level configuration
This level of configuration is supported by using the query extension DSL. This section will have a new chain DSL
structure. In a similar manner to the _"meta" section of the mapping DSL, the "ext" will contain the rankers &
rewriters list.
Extension under _search
The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears
This is a flow chart visualization of the chain steps:
Chain Context
Search Relevancy Context Information
In order for the rewriter and ranker chain to be able to track and be informed of all the modifications each step is
performing an execution context is needed.
This context will have the next fields that can be applied to any future plugin that needs to perform rewrites or
ranking
params section is an input to each and every ranker and rewriter that it may use it for its own needs*
execution (execution related content that is generated throughout the pipeline)
This execution section may have additional internal fields which are related to the execution flow itself and are
subject to future changes*
This context will be attached to the query DSL under the _ext section.
POST my_index/_search
Activating Query rewriter / rerankers
During the lifetime of the index, once a query is running against an index - the following steps will occur:
verify the index if search-relevancy activated
create the search-relevancy context information (or use existing one if such was created)
for each rewrite step in the rewriters list :
for each semantic-ranker step in the rankers list:
In case the rewriter/ranker doesn't appear in the query ext section, but it does appear in the relevant index **
mapping** section -
the configuration details from the index mapping section will be copied into the query relevant ext section.
To disable a rewriter/ranker from being activated on a query in cases where the index mapping indicate it is a part of
the chain,
add their name to exclude list under the execution section.
Example
Configuration Stage
Step 0: Create plugins configuration settings
PUT /_plugins/_querqy/rewriter
PUT /_plugins/_kendra
Step 1: Create mapping for index my_index
PUT my_index/_mapping
Query Stage
Step 2: original request from user : “rambo”
Step 2.1: Structured query from application coming to OpenSearch (this is done by the customer’s application)
POST my_index/_search
The chain flow control intercepts the index search request and will dispatch the request for each the query rewriter
Step 3: First rewriter (Querqy) is dispatched and generates the new query (query rewrite)
Step 3: chain flow control has no additional rewrites to dispatch - so it will dispatch to the rankers. The first ranker in the chain will review the context params and take the necessary information .
After it will complete its action it will have the results ranked according to its internal reasoning
Response Stage
Step 4: Reranking work after the rewrite chain is completed - returning the results to the original calling service
ranker search results json
The response DSL dosn't contain such ext part - this RFC is suggesting to add such a section to the results.
The text was updated successfully, but these errors were encountered: