-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combination of extractive QA and FAQ-style #280
Comments
Hi @mongoose54 ,
We will refactor the FAQ-Style QA part in Haystack quite significantly soon to make it more generic to find "most similar documents". Regarding the combination: I would see the combination happening on a rather high level. For example, you could have two
Working on the training / fine-tune option for DPR in #273. If you really want to experiment on your raw corpus, you could try training your encoders via an Inverse Cloze Task (see https://arxiv.org/pdf/1906.00300.pdf). The performance of this type of retriever is a bit worse than DPR, but the required data is less demanding. Hope this helps! |
Thank you for the wonderful explanation and description. I have a followup question:
Can you elaborate on the "query" and "positive passage" ? Is the "query" also a question? Also, is there an example to do this on your Haystack API? |
Yes, in our case (i.e. QA):
Inference: Yes, have a look at this tutorial that uses the DensePassageRetriever (= "DPR") |
Hello @tholor , |
Hey @guillim , Do you mean you get an error when running Tutorial 4 locally in a docker? If yes, can you please create a new issue with the error and environment details? |
I think I just found out where the problem comes from. I will create an issue, and a PR associated to it. It comes down to the ElasticsearchDocumentStore initialisation, more precisely on |
In your tutorial FAQ_Style QA you mention that a combination of extractive QA and FAQ-style is a very interesting idea and I totally agree. You plan to have a tutorial on this?
In the meantime I want to experiment on the following: I have 1) a corpus dataset (i.e. research papers) without QA, and 2) a small QA dataset (same domain) that I want to combine/utilize both.
I see how to fine-tune the reader model following the fine-tuning tutorial but I want to use
DensePassageRetriever
but how can I fine-tune it with my corpus dataset?┆Issue is synchronized with this Jira Task by Unito
The text was updated successfully, but these errors were encountered: