To have Lucene query DSL compliant Search API #429

lalitpagaria · 2020-09-24T11:30:54Z

Is your feature request related to a problem? Please describe.
I am tying to integrate Haystack with Confluence (later to extend work to other Atlassian products). Confluence search is very bad for us (at my work location) to find right page (specifically RFCs) hence leads to duplicate work. I am writing one confluence plugin to route these full text search queries to Haystack instead of default Lucene based system.

Describe the solution you'd like
To have Lucene query DSL compliant /query endpoint (I am not asking to support each and every functionality). It will make integration with other systems more straight forward. Also this will make Haystack a drop-in replacement (in limited capacity) for any ES or Lucene based system.

Describe alternatives you've considered
I am currently thinking of modifying LUcene QUery Manipolator to translate Lucene query DSL queries to /doc-qa endpoint supported format.

Additional context
None

The text was updated successfully, but these errors were encountered:

lalitpagaria · 2020-10-01T10:49:40Z

@tanaysoni @tholor just checking whether this suggested enhancement align with your product roadmap?
If yes then I will wait otherwise I will write bit hackish solution to complete integration with confluence.

tanaysoni · 2020-10-02T13:50:08Z

Hi @lalitpagaria, thank you for the feature request. I think integrations to other systems would be useful for the community!

I like the idea of translating Lucene queries to adapt as per the /doc-qa endpoint. The query string can be converted to a question along with any additional metadata filters we can extract from the Lucene query.

I am curious about how you're planning the ingestion of documents from Confluence to a Haystack Document Store?

lalitpagaria · 2020-10-02T23:15:35Z

@tanaysoni Currently I am using very hackish solution, using confluence crawler to fetch pages and creating txt file (still work in progress to clean HTML tags). And then calling /file-upload to upload file to haystack.

But I see many issues with this approach (using haystack as a service), hence I will use haystack as a library for easier customisations on Document Store. Also I am planning to keep mapping of page_id -> haystack_doc_id, so easy to take care of update/deletion of pages. My design is still very raw, it will take some time to evolve.

tanaysoni · 2020-10-05T15:42:36Z

Hi @lalitpagaria, thank you for sharing the details. I look forward to knowing how the end-to-end pipeline works out!

For the Lucene query part, what do you think of a new endpoint in the REST API that accepts Lucene DSL & converts to the /doc-qa format like you earlier proposed? Would that work for your use-case?

lalitpagaria · 2020-10-05T22:41:45Z

For the Lucene query part, what do you think of a new endpoint in the REST API that accepts Lucene DSL & converts to the /doc-qa format like you earlier proposed? Would that work for your use-case?

Yes it will work. For DSL, I found elasticsearch-dsl, which is better supported and maintained by Elastic. Can raise PR if that is fine?

I look forward to knowing how the end-to-end pipeline works out!

For me cleaning is not working fine. Confluence giving data in html format, and I am trying to clean it via tika but for 50% docs it is failing or not able to clean it. BTW I found better lib to fetch documents from confluence/Jira, as it support OAuth as well. Also Atlassian, deprecating xmlrpc calls and promoting allowing rest APIs.

lalitpagaria · 2020-10-06T16:55:04Z

You can assign this to me. I will raise WIP PR to get initial feedback and then we can go from there.

lalitpagaria · 2020-10-16T11:30:50Z

Completed by #471

lalitpagaria added the type:feature New feature or request label Sep 24, 2020

tholor assigned tanaysoni Sep 29, 2020

tholor added this to the #2 milestone Oct 6, 2020

tanaysoni assigned lalitpagaria and unassigned tanaysoni Oct 6, 2020

lalitpagaria mentioned this issue Oct 6, 2020

Add Elasticsearch Query DSL compliant Query API #471

Merged

lalitpagaria closed this as completed Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To have Lucene query DSL compliant Search API #429

To have Lucene query DSL compliant Search API #429

lalitpagaria commented Sep 24, 2020 •

edited

Loading

lalitpagaria commented Oct 1, 2020

tanaysoni commented Oct 2, 2020

lalitpagaria commented Oct 2, 2020 •

edited

Loading

tanaysoni commented Oct 5, 2020

lalitpagaria commented Oct 5, 2020 •

edited

Loading

lalitpagaria commented Oct 6, 2020

lalitpagaria commented Oct 16, 2020

To have Lucene query DSL compliant Search API #429

To have Lucene query DSL compliant Search API #429

Comments

lalitpagaria commented Sep 24, 2020 • edited Loading

lalitpagaria commented Oct 1, 2020

tanaysoni commented Oct 2, 2020

lalitpagaria commented Oct 2, 2020 • edited Loading

tanaysoni commented Oct 5, 2020

lalitpagaria commented Oct 5, 2020 • edited Loading

lalitpagaria commented Oct 6, 2020

lalitpagaria commented Oct 16, 2020

lalitpagaria commented Sep 24, 2020 •

edited

Loading

lalitpagaria commented Oct 2, 2020 •

edited

Loading

lalitpagaria commented Oct 5, 2020 •

edited

Loading