-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Multi-Hop Dense Retrieval #2571
Conversation
Hello @deutschmn, great to see this PR so quickly! 🤩 I'm going to have a look at the diff soon. Is there anything specific you have doubts about, or that we should focus our attention on? |
Not yet, thanks! I'm not sure whether using |
Training, saving, and loading work now. I'll start a proper training with more data soon to check out whether I get reasonable results 😊 |
@@ -485,3 +485,67 @@ def convert_from_transformers( | |||
bi_adaptive_model.connect_heads_with_processor(processor.tasks) # type: ignore | |||
|
|||
return bi_adaptive_model | |||
|
|||
|
|||
class BiAdaptiveSharedModel(BiAdaptiveModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand this, you are using the same language model as an encoder. So instead of the original DPR architecture where we have a question and a document encoder, you are using the same encoder for both.
I'd argue that this is the same architecture as used by sentence transformers (https://sbert.net/docs/training/overview.html) and maybe we should consider if it makes sense to re-implement all of the functionality already provided there or if it makes sense to use sentence transformers under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand this, you are using the same language model as an encoder. So instead of the original DPR architecture where we have a question and a document encoder, you are using the same encoder for both.
Exactly. But, most importantly, MDR also uses a different retrieval mechanism that iteratively adds context documents (see https://ar5iv.labs.arxiv.org/html/2009.12756#S2.SS2.SSS0.Px1).
I'd argue that this is the same architecture as used by sentence transformers (https://sbert.net/docs/training/overview.html) and maybe we should consider if it makes sense to re-implement all of the functionality already provided there or if it makes sense to use sentence transformers under the hood.
Yes, I agree that it's quite similar to sentence transformers, except for the multi-hop part, which they don't support AFAIK.
For now, I based my implementation on Haystack's DensePassageRetriever
, which uses Hugging Face's DPR, and then calls the document store's query_by_embedding
. It took only a few changes to adapt this for MDR. I'm not sure if there is much to be gained when using sentence transformers. What you see in my changes now is all there was to do. But please let me know if you find parts that can be simplified by using sentence transformers 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the BiAdaptiveSharedModel
would not be needed if an EmbeddingRetriever
would be used inside the MDR. The EmbeddingRetriever
would just be used to create the embeddings and the retrieval can be custom as required in MDR. I'd also argue that having the train
method on the MDR is not intuitive because there isn't anything specific to MDR in the training code and it is just the same training as used in sentence transformers.
I think this would reduce the code that is needed quite a bit. At the same time maybe someone from the team could chime in (@julian-risch ?) If it makes sense to replicate sentence-transformers within FARM architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good point, thanks. Frankly, I didn't have a look at EmbeddingRetriever
since I thought the closest to MDR would be DPR. Therefore, I duplicated and adapted its code - Hugging Face style. I built BiAdaptiveSharedModel
so that MDR doesn't differ from DPR too much.
However, if the goal is to have less duplicated code, deriving from EmbeddingRetriever
and overriding retrieve
/retrieve_batch
might be a good idea. Where do you propose we then implement the training? It would probably make sense to have that not only for MDR but every EmbeddingRetriever
.
Short update on the training: The current implementation does DPR-style training, which is, as @mathislucka has correctly remarked, equivalent to what sentence transformers do if the encoder is shared. However, in the MDR paper and reference implementation, the authors use a different way that accounts for the multi-hop retrieval method by incorporating both hops into the loss function: https://github.com/facebookresearch/multihop_dense_retrieval/blob/62eb2427e36a648a927c6e39bb4c748796f7b366/mdr/retrieval/criterions.py#L114. I've checked the results of the current method, and they seem suboptimal. I believe the retriever focuses too much on the first context passage if it's not explicitly taught not to do that. Apart from that, if we integrate MDR training into Haystack, it should be what was presented by the original authors, right? My next steps will be to a) try to use Facebook's checkpoints or b) train a model with the reference implementation, push the weights to Hugging Face and load it into Haystack. If this is successful, we can move to adding training support. What do you think? 😊 |
I've now successfully loaded one of Facebook's checkpoints into Haystack, pushed it to the HF Hub and made it the default model for MDR: I've removed the DPR-style training code and decided to wait for your feedback before continuing. Maybe it would be a good idea to add training in another PR since it's likely not completely trivial to implement. That's it for my first draft of this PR. It would be great if some of the maintainers could look into what I've done so far (@ZanSara, @julian-risch, @bogdankostic) 😊 Maybe you could also weigh in on the discussion regarding |
Hey @deutschmn, sorry for the late reply! 🙃 I will look at the details of your PR and provide some feedback tomorrow. |
Hi @deutschmn , I am really interested in this PR and want to test it aswell. Could you please briefly describe which steps are necessary after merging the PR locally? Kind regards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR, looks very clean!
Not sure whether we should introduce the BiAdaptiveSharedModel
or try using the EmbeddingRetriever
instead (as @mathislucka proposed). What do you think @julian-risch?
haystack/nodes/retriever/dense.py
Outdated
|
||
def retrieve_batch( | ||
self, | ||
queries: Union[str, List[str]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #2575, we changed the type of the queries
param in the batch methods to List[str]
.
I would prefer re-using |
Thanks for your feedback, @bogdankostic and @julian-risch! I'll look into it in the next couple of days. @JulianGerhard21 Happy to hear that you're interested! Here's a notebook showing how you can use |
Hi @deutschmn how is it going? 🙂 We are planning a Haystack release for in about ten days (first week of July). It would be great to have Multi-Hop Dense Retrieval in there as a feature! Do you think you can incorporate the change requests until then so that we can merge your PR? |
Hey @julian-risch! Sorry, I was pretty busy the last couple of days. I'll start incorporating your feedback today or tomorrow so the feature can make the cut for the release 😊 |
Alright, I also
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking already pretty good to me, just left a few comments wrt naming, docstrings and filters and one question wrt to changing the signature of the forward
method.
haystack/nodes/retriever/dense.py
Outdated
@@ -1898,3 +1898,303 @@ def save(self, save_dir: Union[Path, str]) -> None: | |||
:type save_dir: Union[Path, str] | |||
""" | |||
self.embedding_encoder.save(save_dir=save_dir) | |||
|
|||
|
|||
class MultihopDenseRetriever(EmbeddingRetriever): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose renaming this to MultihopEmbeddingRetriever
, this might make it more clear that this is an extension of the EmbeddingRetriever
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
haystack/nodes/retriever/dense.py
Outdated
embed_meta_fields: List[str] = [], | ||
): | ||
""" | ||
Same parameters as `EmbeddingRetriever` except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see descriptions for all the parameters that can be set in the init method. Feel free to just copy them from the EmbeddingRetriever
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
haystack/nodes/retriever/dense.py
Outdated
" as queries or a single filter that will be applied to each query." | ||
) | ||
else: | ||
filters = [{}] * len(queries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be changed. If the user provides a single filter dict, it will be overwritten by an empty filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -971,7 +971,9 @@ def get_similarity_function(self): | |||
f"The similarity function can only be 'dot_product' or 'cosine', not '{self.similarity_function}'" | |||
) | |||
|
|||
def forward(self, query_vectors: torch.Tensor, passage_vectors: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: | |||
def forward( | |||
self, query_vectors: torch.Tensor, passage_vectors: Optional[torch.Tensor] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe explain why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change because I discovered that if a BiAdaptiveModel
is instantiated with a TextSimilarityHead
as one of its prediction_heads
, the head can be called with None
as the second forward parameter here:
embedding1, embedding2 = head(output1, output2) |
I added the default parameter (= None
) because it used to be necessary for my BiAdaptiveSharedModel
.
However, as it doesn't really have anything to do with this PR anymore, I reverted the change now 😊
Thanks for implementing the requested changes so fast! Could you merge the current master into your branch? In the last days, we did a few fixes wrt to our CI and deprecations from other libraries we use in Haystack. |
@bogdankostic Done 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me! Thank you so much for your contribution to Haystack :)
* Implement MDR * Adapt conftest to new MDR signature * Update Documentation & Code Style * Change signature of queries param in batch methods of MDR like in deepset-ai#2575 * Update Documentation & Code Style * Rename MultihopDenseRetriever to MultihopEmbeddingRetriever * Fix filters in retrieve_batch * Add docstring for MultihopEmbeddingRetriever.__init__ * Update Documentation & Code Style * Revert forward signature of TextSimilarityHead Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@deutschmn @bogdankostic hi! Is there any documentation I can refer to about the MultiHopDenseRetriever and how to use it? Thanks! |
Hi @Ssingh1997! We have a short passage about the MultihopEmbeddingRetriever in our documentation: https://docs.haystack.deepset.ai/docs/retriever#multihop-embedding-retriever |
@bogdankostic Is there documentation for MultiHopDenseRetriever on the parameters for it and how to set number of iterations? |
Proposed changes:
As discussed in #2555, this PR adds a new node for Multi-Hop Dense Retrieval (MDR): Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval.
Status:
TODOs:
MultihopDenseRetriever
that can retrieve.BiAdaptiveSharedModel
that bridges the gap betweenMultihopDenseRetriever
andRoberta
. (might need refactoring for training)save
andload
.(Maybe) add an equivalent forDPRQuestionEncoder
/DPRContextEncoder
.Implement and test training.→ no training support yet@ZanSara @julian-risch @bogdankostic I'm still actively working on this, but if you already have some comments, I appreciate your feedback 😊