-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To have Lucene query DSL compliant Search API #429
Comments
@tanaysoni @tholor just checking whether this suggested enhancement align with your product roadmap? |
Hi @lalitpagaria, thank you for the feature request. I think integrations to other systems would be useful for the community! I like the idea of translating Lucene queries to adapt as per the I am curious about how you're planning the ingestion of documents from Confluence to a Haystack Document Store? |
@tanaysoni Currently I am using very hackish solution, using confluence crawler to fetch pages and creating txt file (still work in progress to clean HTML tags). And then calling But I see many issues with this approach (using haystack as a service), hence I will use haystack as a library for easier customisations on Document Store. Also I am planning to keep mapping of |
Hi @lalitpagaria, thank you for sharing the details. I look forward to knowing how the end-to-end pipeline works out! For the Lucene query part, what do you think of a new endpoint in the REST API that accepts Lucene DSL & converts to the |
Yes it will work. For DSL, I found elasticsearch-dsl, which is better supported and maintained by Elastic. Can raise PR if that is fine?
For me cleaning is not working fine. Confluence giving data in html format, and I am trying to clean it via tika but for 50% docs it is failing or not able to clean it. BTW I found better lib to fetch documents from confluence/Jira, as it support OAuth as well. Also Atlassian, deprecating xmlrpc calls and promoting allowing rest APIs. |
You can assign this to me. I will raise WIP PR to get initial feedback and then we can go from there. |
Completed by #471 |
Is your feature request related to a problem? Please describe.
I am tying to integrate Haystack with Confluence (later to extend work to other Atlassian products). Confluence search is very bad for us (at my work location) to find right page (specifically RFCs) hence leads to duplicate work. I am writing one confluence plugin to route these full text search queries to Haystack instead of default Lucene based system.
Describe the solution you'd like
To have Lucene query DSL compliant
/query
endpoint (I am not asking to support each and every functionality). It will make integration with other systems more straight forward. Also this will make Haystack a drop-in replacement (in limited capacity) for any ES or Lucene based system.Describe alternatives you've considered
I am currently thinking of modifying LUcene QUery Manipolator to translate Lucene query DSL queries to
/doc-qa
endpoint supported format.Additional context
None
The text was updated successfully, but these errors were encountered: