[RFC] Introduce pluggable QueryCollectorContexts and CollectorManagers #13978
Labels
enhancement
Enhancement or improvement to existing feature or request
Plugins
RFC
Issues requesting major changes
Roadmap:Search
Project-wide roadmap label
Search:Aggregations
Search
Search query, autocomplete ...etc
TL;DR
I am proposing a way to separate out query preprocessing from query path execution in the existing
SearchPlugin
'sQueryPhaseSearcher
extension points. This will allow multipleSearchPlugin
s to provideCollector
/CollectorManager
related customization which is a limitation of theQueryPhaseSearcher
today.Background Information
Today the only extension point to plug in any search customization in the query phase is the
QueryPhaseSearcher
interface defined in theSearchPlugin
. This requires any plugins that want to add customization into the query phase or even introduce additionalCollector
s implement an entire newQueryPhaseSearcher
, and only 1 plugin'sQueryPhaseSearcher
can be used at a time.Specifically, the main entry point is the
searchWith
methodOpenSearch/server/src/main/java/org/opensearch/search/query/QueryPhaseSearcher.java
Lines 39 to 46 in 293905a
which allows an implementation to [1] customize how the search path is executed using the
ContextIndexSearcher
(for example theConcurrentQueryPhaseSearcher
is used to enable concurrent search) as well as [2] perform any preprocessing or modifications on theQuery
,SearchContext
, or even theQueryCollectorContext
s linked list. However, because of the broad possibility of implementations here, today we are limited to a singleQueryPhaseSearcher
implementation per cluster. In this past this limitation created a conflict between theConcurrentQueryPhaseSearcher
and theHybridQueryPhaseSearcher
which had to be resolved by making the latter extend the (wrapper class of) the former: opensearch-project/neural-search#356In the first case [1] where a
QueryPhaseSearcher
is used to modify the search execution path there is not much we can do to support multipleQueryPhaseSearcher
s as ultimately we need to decide on only a single execution path to perform. For example, we can’t use theDefaultQueryPhaseSearcher
and theConcurrentQueryPhaseSearcher
together because we can’t perform both concurrent and non-concurrent search (or at least it doesn’t make sense to).However, for the second case [2] such as the
HybridQueryPhaseSearcher
where thesearchWith
extension point is not used to modify the search execution path but instead modify theQuery
on which to perform search, multiple preprocessing implementations should be able to be composed together.Additionally the
aggregationProcessor
methodOpenSearch/server/src/main/java/org/opensearch/search/query/QueryPhaseSearcher.java
Lines 53 to 55 in 293905a
is used to setup and post process aggregation related collectors however, today it is also the only place which can be used to introduce custom
CollectorManager
implementations. For example, theHybridAggregationProcessor
uses this extension point to introduce theHybridCollectorManager
without making any modifications to the defaultAggregationProcessor
. Since theAggregationProcessor
is introduced in the sameQueryPhaseSearcher
interface, the same limitation exists today where only a singleAggregationProcessor
can be used per cluster. However, in cases like this where a plugin strictly wants to add additionalCollectorManager
s to thequeryCollectorManagers
map there are no conflicts across plugin implementations so we should be able to support an additional extension point here to allowSearchPlugin
s to provide customCollectorManager
s.Describe the solution you'd like
At a high level I am proposing to decouple the query preprocessing logic (performing any processing or modifications with
Query
,SearchContext
, andQueryCollectorContext
s) from the search execution path logic. Each of theQuery
,SearchContext
, andQueryCollectorContext
s could have their own distinct pluggable extension points that allow multiple plugins to provide implementations. Each of these components deserve their own separate RFC to discuss how to best proceed so this issue will focus on makingQueryCollectorContext
and associatedCollectorManager
s pluggable.Today a
LinkedList
ofQueryCollectorContext
s is created inQueryPhase::executeInternal
with various defaultQueryCollectorContext
s based on the query parameters (such asterminate_after
,min_score
, etc.). It is subsequently passed to the previously mentionedQueryPhaseSearcher::searchWith
method where any plugin customization can happen and this is also where it is usually transformed from theQueryCollectorContexts
into theCollector
tree and passed toIndexSearcher::search
to perform the query phase.I am proposing to introduce a
QueryCollectorContextProvider
construct which can take pluggableQueryCollectorContext
constructors to override default contexts.I see 2 use cases for this today:
TopDocsCollector
. This was done by adding an additionalSearchWithCollector
method which is pretty fragile and still tightly coupled to theQueryPhaseSearcher
. This could be done instead by overriding the defaultTopDocsQueryCollectorContext
with an empty context based on whatever condition checks need to be performed. Ref: Refactor implementations of query phase searcher, add empty QueryCollectorContext #13481.Collector
tree. This could involve taking theAggregator
tree and wrapping it in aPrefetchableCollector
type of construct to capture the doc IDs on which to perform prefetch related actions. This could be accomplished by overriding the defaultsearch_multi
QueryCollectorContext
via this new extension point.Moreover, I am also proposing to add a plugin hook to allow
SearchPlugin
s to add implementations directly to thequeryCollectorManagers
map. Although customCollectorManager
s could be plugged in with the above proposal of overriding theQueryCollectorContext
implementation, this would still have the same limitation of only 1 plugin implementation at a time.CollectorManager
s do not need to be bound by this restriction and we can open up an extension point to allow as many pluggableCollectorManager
s as needed.I also see 2 use cases for this today:
HybridAggregationProcessor
simply adds aHybridCollectorManager
to thequeryCollectorManagers
map however today is has to implement a whole AggregationProcessor just to do so despite not making any changes to the defaultAggregationCollectorManagers
.Describe alternatives you've considered
Instead of an extension point for overriding default
QueryCollectorContexts
as well as an extension point for thequeryCollectorManagers
map, we could instead introduce a notion of priority toQueryCollectorContext
and allow plugins to provide their ownQueryCollectorContext
implementations with priority, similar to howActionFilters
work today. AllQueryCollectorContext
implementations would then be chained together based on their priority when transforming theQueryCollectorContext
into theCollector
tree. Rough POC: main...jed326:OpenSearch:ordered-contextThis allows plugins to provide an uncapped number of
QueryCollectorContext
implementations, however since the order of the contexts will determine the structure of theCollector
tree this means the context priority will be tightly coupled with it’s functionality and eachQueryCollectorContext
would likely need to have manyinstanceof
checks to ensure it is performing its intended function, similar to howBucketCollectorProcessor
is implemented today:OpenSearch/server/src/main/java/org/opensearch/search/aggregations/BucketCollectorProcessor.java
Lines 60 to 86 in 618782d
The text was updated successfully, but these errors were encountered: