Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] bm25 semantic search / QueryExpander #573

Closed
flozi00 opened this issue Nov 10, 2020 · 7 comments
Closed

[Discussion] bm25 semantic search / QueryExpander #573

flozi00 opened this issue Nov 10, 2020 · 7 comments
Assignees
Labels
Milestone

Comments

@flozi00
Copy link

flozi00 commented Nov 10, 2020

In an internal system we did not passed the raw question to elastic search.
We wanted the speed of bm25 but the intuitivity of vectorized search too, so we some manipulations to the elastic search Input.

  1. Filter keywords
  2. Generate synonyms and acronyms
  3. Rebuild the search query like this: raw query + keywords (3 times) + synonyms ( one or two times, depends on the data quality and size)

After that we passed the data to DPR and those results to the QA model.

Would such an integration make sense for this project ?

@tholor
Copy link
Member

tholor commented Nov 11, 2020

Hey @flozi00 ,

Yes, this sounds interesting and I could see an integration via creating a new "QueryExpander" class.
With the planned Pipeline (#544) we could add this as a task/node between the incoming query and a retriever.

After that we passed the data to DPR and those results to the QA model.

Do you really mean DPR here? I thought you wanted to pass your expanded query to ES to get the "speed of bm25"?

  1. Filter keywords

Can you give an example of this step? I thought you would filter out keywords here, but later you say "+ keywords (3 times)", so I guess I am not understanding your step here ...

@tholor tholor self-assigned this Nov 11, 2020
@flozi00
Copy link
Author

flozi00 commented Nov 11, 2020

Do you really mean DPR here? I thought you wanted to pass your expanded query to ES to get the "speed of bm25"?

Yeah, we let DPR rerank the results, but it is not mandatory.

Can you give an example of this step? I thought you would filter out keywords here, but later you say "+ keywords (3 times)", so I guess I am not understanding your step here ...

Raw query: Please tell me whats the weather in berlin

  1. weather berlin (extracted keywords, removed stopwords)
  2. sun rain cloud-cover wind-speed ... (generated synonyms)
  3. query = Please tell me whats the weather in berlin + weather berlin + weather berlin + weather berlin + sun rain cloud-cover wind-speed ... (generated new query, which can be passed to ES)

@tholor
Copy link
Member

tholor commented Nov 11, 2020

Ok got it. I totally see the value of such a new QueryExpansion class.
It could roughly look like this:

  • input: query
  • output: modified query
  • different params to configure the wanted expansion tactics (e.g. keyword filters, synonym generation ...)

Would you be interested in raising a PR? We would then take care of integrating it into the new, upcoming Pipeline object.

@flozi00
Copy link
Author

flozi00 commented Nov 11, 2020

Yeah, I can do so.
Hope to find time some night ;-)

@tholor
Copy link
Member

tholor commented Nov 11, 2020

Great! Very much appreciated 👍

@flozi00 flozi00 mentioned this issue Nov 11, 2020
3 tasks
@tholor tholor changed the title [Discussion] bm25 semantic search [Discussion] bm25 semantic search / QueryExpander Nov 30, 2020
@tholor tholor added this to the #7 milestone Jan 6, 2021
@stale
Copy link

stale bot commented May 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.

@stale stale bot added the stale label May 6, 2021
@stale stale bot closed this as completed May 28, 2021
@amotl
Copy link
Contributor

amotl commented Dec 9, 2021

Dear @tholor,

we just arrived here via [1] and would like to salute you and the other authors of this framework for their efforts.

In the context of what @flozi00 was asking for above, like »Raw query: Please tell me what's the weather in Berlin«, we would like to share our little experimental project [2] with you. Andrew Wigmore (@visualcrossing) might be interested in this topic as well [3].

In this manner, I am humbly asking you to reopen this issue to keep it as a note where people are actively interested in, or, maybe, just move it to the "Discussions" section? I believe it could fit better there.

Keep up the spirit and with kind regards,
Andreas.

[1] https://news.ycombinator.com/item?id=29501045
[2] https://github.com/earthobservations/weather-nlp
[3] https://github.com/earthobservations/weather-nlp#other-implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants