Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement functionality similar to PinnedQueryBuilder #72

Open
harold-wang opened this issue Feb 9, 2021 · 9 comments
Open

Reimplement functionality similar to PinnedQueryBuilder #72

harold-wang opened this issue Feb 9, 2021 · 9 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@harold-wang
Copy link
Contributor

harold-wang commented Feb 9, 2021

PinnedQueryBuilder is removed as part of removal x-pack, this is a track ticket to develop similar functionality in OpenSearch.

https://github.com/elastic/elasticsearch/blob/v7.10.2/client/rest-high-level/src/test/java/org/elasticsearch/client/SearchIT.java#L1385

@harold-wang harold-wang added documentation Improvements or additions to documentation enhancement Enhancement or improvement to existing feature or request :Feature/Datastream Issues related to data streams labels Feb 9, 2021
@tlfeng tlfeng mentioned this issue May 5, 2021
@Hronom
Copy link
Contributor

Hronom commented May 6, 2021

Is your feature request related to a problem? Please describe.
We need to promote a list of documents, for example 240.
And at the same time all filters should be applied, so we make sure that this documents are searchable and match criteria's.

What we need already implemented in X-Pack https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-pinned-query.html#query-dsl-pinned-query

It will be cool to have it back.

@dblock dblock changed the title Backup ticket to recover PinnedQueryBuilder Reimplement PinnedQueryBuilder Jul 16, 2021
@dblock dblock changed the title Reimplement PinnedQueryBuilder Reimplement functionality similar to PinnedQueryBuilder Jul 16, 2021
@anasalkouz anasalkouz added Indexing & Search and removed documentation Improvements or additions to documentation labels Nov 16, 2021
@anasalkouz
Copy link
Member

Pending more votes from the community to prioritize it.

@anasalkouz anasalkouz removed the :Feature/Datastream Issues related to data streams label Nov 16, 2021
@asknacho
Copy link

asknacho commented Jun 20, 2022

It's starting to get some traction! C'mon! This works like a charm when mixing ES with a recommendation engine. 🚀 Like AWS Personalize 😉

@asknacho
Copy link

Hello @anasalkouz! Quick question: is there any vote threshold needed to start prioritizing this feature? Is there anything that we can do to help move this forward? Thanks in advanced!

@dblock
Copy link
Member

dblock commented Jul 27, 2022

@asknacho @Hronom want to try to contribute this feature?

We cannot take non-APLv2 code, but happy to merge a PR if someone wants to reimplement this functionality, confirming that they aren't copying/looking at non-APLv2 code.

@tlfeng
Copy link
Collaborator

tlfeng commented Feb 2, 2023

There is a contributor @Aero-Blue who composed a design document for the feature, and wrote some code aiming to make the feature as a plugin. If anyone has got the interest to implement the feature, please feel free to refer to those achievements.

Code (July 2022): https://github.com/Aero-Blue/opensearch-pinned-query

Design Document (June 2022):

Search Result Pinning - Design

1. Overview

Project Title:

Implement pinning selected documents in query results of OpenSearch

Project Description:

The project is to resolve the Github issue: #72, which is a feature request for OpenSearch from the community. The process to complete the project includes figuring out the specific demand of the user for this feature, looking for possible solutions, and finally delivering a solution to the satisfy the demand. The solution can be contributing new codes to either opensearch-project or other open-source projects that can be used with OpenSearch, such as querqy.

In the above Github issue, there are several OpenSearch users asked for implementing the non-opensource feature of Elasticsearch “Pinned Query” in OpenSearch. The feature is a kind of search query which promotes selected documents to rank higher than those matching the given query so that the selected documents can always be displayed above organic search results.

2. Problem Statement/ Use cases

Problem Statement

There is currently no way to pin certain search results to make sure they are always at the top of the list of results returned by OpenSearch.

Use Cases

  • Promotional content (i.e. ads)
  • Informative content (info that may be pertinent to the customer for a specific query)
  • Mark of quality (score correction, or boosting good results)

3. Goals/ Requirements and scope

Must have requirements

  • Display pinned search results at the top, above all natural or “organic” results.
  • This should happen even with all filters applied in order to make the documents searchable and to match search criteria.
  • If the primary sort order is not “relevance” then pinned results should not be shown to the customer.

Simply just “pinning” some results to the top may work for some use cases (i.e. general ads or promotions) but the user could possibly benefit from having some more control or customization over the pinned elements in their appearance and description.

Nice to have requirements

Here are two other possible needs the user might have and why:

  • Specific ordering of promoted content, i.e. the user may want one pinned document listed above another (think of having the ability for companies to pay more to be listed higher. This ordering may be specific or could simply be “random” in the sense that each pinned query has the same chance of being in each position.
  • Slightly different labeling of content, sometimes the document may be an “ad”, other times it may be “higher quality content”, or it could be some kind of “notice” or disambiguation about the query itself. Think of these as different “categories” that offer insight into why the query is pinned in the first place. Otherwise there could be confusion from the customer’s point of view on what the motivation was for pinning the content, was there some monetary reason or is it simply informational.

With these options implemented the user would be able to order pinned results, either randomly or with a specific order, and would be able to label the pinnings to give a clearer idea of their purpose.

4. Glossary/Terminology

  • index - A unique identifier or “mapping” that corresponds to a specific document.
  • document - A document is a JSON object that contains data in OpenSearch.
  • query - A query is a request made by the user to OpenSearch for documents that match given parameters.
  • coordinator node - The node that duplicates requests to all other nodes.

5. Proposed solution

Score modifications

first we identify the pinned hits then we would use some solution to sort them

  • pinning ids: ['1a', '2b', '3c'] - modification the scores for the 3 hits to make them be in the top
  • If the user sends a query that does not contain an attribute pinned_results then we don’t need to modify anything, otherwise the QueryResultPinner class will be called to handle it.
  • The QueryResultPinner will take in the global list of results in form of (DocID, score) as input from the coordinator node after all other nodes in the cluster have completed their tasks.
  • It will also take the pinned_results list in from the original request.
  • Essentially it will be initialized like QueryResultPinner(pinned_results, global_results_list).
  • The class is expected to return a modified results list with the same form as the global results list.
  • First we reverse pinned_results so we can maintain original ordering while adding to the top.
  • Now we have to locate the results with the ids specified in pinned_results which we do with a simple loop.
  • We have a variable called highscore which contains the score of global_results_list[0].
  • A variable max_score is initially set to highscore and is incremented by 1 each time.
  • We pop the result, pinned_result = Result(DocID, score) now setting score (pinned_result[1]) to new_highscore, we have a global list, new_results which is the list we plan to return. Right now we will insert the pinned_results there, while removing them from global_results_list.
  • That is the logic for finding each result, modifying its score and adding it to the pinned_results list.
  • Eventually we merge the two lists, new_results = [pinned_results, global_results_list].
  • We return the max_score and the new_results list which the coordinator node will use to finish the query phase and complete the fetch phase before returning to the user.

5.2 Sequence Diagram (optional)

Sequence Diagram drawio

Here is what a GET request would look like with the pinned results being passed in the body of the request:

GET _search
{
    "query": {
        "match": {
            "title": "wind"
        },
        "pinned_resuls": {
            "ids": ["P2XAEFA7", "K350VDCV", "M30M0R68"]
        }
    }
}

Here is what a standard GET request response looks like, it will include the new modified max_score and it will include the correct ordering of the results and pinned results.

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [...]
    ]
  }
}

5.3.1 Pros

  • Simple solution, only need to modify score and the rest is already done by OpenSearch.
  • Native solution, don’t need to use any extra packages to solve the problem.

5.3.2 Cons

  • Have to modify GET request attributes which might be difficult.
  • Modifying scores could result in unforeseen issues.

6. Other Approaches

6.1 Using Querqy

From the Qeurqy docs:

Querqy is a query rewriting framework for Java-based search engines. It is probably best-known for its rule-based query rewriting, which applies synonyms, query dependent filters, boostings and demoting of documents. This rule-based query rewriting is implemented in the ‘Common Rules Rewriter’, but Querqy’s capabilities go far beyond this rewriter.

Note the features of boosting and demoting documents. The Product Manager said that there would have to be modifications made to Querqy’s source code in order to make it work.
A solution using querqy would most likely involve using it’s rewriters feature to promote results marked as pinned after the query was invoked.
6.1.1 Pros

  • The tool is already built and includes the features that this project requires.
  • Would only have to implement a custom rewriter class and the rest is taken care of by Querqy.
  • This would would be easily extended in the future.

6.1.2 Cons

  • Currently Querqy only supports ElasticSearch and Solr, would likely have to modify Querqy’s source code.
  • Requires another dependency / plugin which adds more complexity and possibility of errors.
  • Would have to manage versions and updates in the future which might modify necessary functionality and introduce unintentional side effects.

6.2 Modifying document attributes

The search results we receive after a query are a series of objects known as “documents”. These documents contain the information about the result.
The basic document looks like this:

`"_index"``:`` ``0``,`
`"_id"``:`` ``0``,`
`"_score"`` ``:`` ``0``,`
`"_source"``:`` ``{`
    "attr1"`:`` ``"value1"``,`
    `...`
`},
"_pinned": True,
`

Note that I have added the attribute “_pinned” set to True indicating that it should be pinned whenever the document is queried.
6.2.1 Pros

  • Would avoid modifying current attributes such a score which might introduce unintended side effects.
  • Possible to add more functionality with more attributes such as specifying the query it should be pinned with.

6.2.1 Cons

  • Would have to modify all current documents to include the new attributes.
  • Likely difficult to order the pinned results.
  • Would essentially have to “whitelist” queries for when to display the pinned document

7. Cost Analysis

Not applicable to this project as there are no outside services used that would require payment.

8. Backwards compatibility

Adding this feature would not be a breaking change as it is adding functionality and would not require removing anything. Therefore it can be released in a minor version and be backwards compatible.

9. Appendix

Appendix A: FAQ

  • Why modify the score?
    • Since one of the requirements is to only display pinned documents when sorting by “relevance”, and since relevance essentially means sorting by score, this makes the most sense because we will have a consistent list with pinned documents having score higher than those that are regular.

@Aero-Blue
Copy link

Thanks @tlfeng. I am the original contributor for this project, it is incomplete because I had to leave due to medical issues unfortunately. I didn’t have time to update the design document but after further research it was decided that it would be best implemented as a plugin. I was able to figure out and modify the requests to include 2 attributes, pinned_query which indicates wether the result is pinned, and a score which indicates it’s position in the pinned results. What I needed to figure out was how to modify the response to a user search before it got to the user and rearrange the results using the attributes mentioned above before returning the request to the user. It should be fairly simple to add to the current project. I’d love it if this project could be continued as I very much enjoyed my internship and time at Amazon and hope I have contributed at least something to the progress of OpenSearch.

@macrakis
Copy link

Querqy has been ported to OpenSearch and works with OpenSearch 2.3.

@anasalkouz
Copy link
Member

@macrakis Shall we close this issue then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

9 participants