Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Multi-Hop Dense Retrieval #2571

Merged
merged 11 commits into from
Jul 5, 2022

Conversation

deutschmn
Copy link
Contributor

@deutschmn deutschmn commented May 18, 2022

Proposed changes:
As discussed in #2555, this PR adds a new node for Multi-Hop Dense Retrieval (MDR): Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval.

Status:

TODOs:

  • Add basic MultihopDenseRetriever that can retrieve.
  • Add BiAdaptiveSharedModel that bridges the gap between MultihopDenseRetriever and Roberta. (might need refactoring for training)
  • Add MDR to retriever test cases.
  • Add support for save and load.
  • (Maybe) add an equivalent for DPRQuestionEncoder/DPRContextEncoder.
  • Implement and test training. → no training support yet
  • Train a model, put a checkpoint on Hugging Face Hub and reference it as the default. → https://huggingface.co/deutschmann/mdr_roberta_q_encoder

@ZanSara @julian-risch @bogdankostic I'm still actively working on this, but if you already have some comments, I appreciate your feedback 😊

@CLAassistant
Copy link

CLAassistant commented May 18, 2022

CLA assistant check
All committers have signed the CLA.

@ZanSara
Copy link
Contributor

ZanSara commented May 18, 2022

Hello @deutschmn, great to see this PR so quickly! 🤩 I'm going to have a look at the diff soon. Is there anything specific you have doubts about, or that we should focus our attention on?

@ZanSara ZanSara added the type:feature New feature or request label May 18, 2022
@deutschmn
Copy link
Contributor Author

Hello @deutschmn, great to see this PR so quickly! 🤩 I'm going to have a look at the diff soon. Is there anything specific you have doubts about, or that we should focus our attention on?

Not yet, thanks! I'm not sure whether using BiAdaptiveSharedModel (derivation of BiAdaptiveModel) is the best solution, but I'll look into it a bit more when implementing the training.

@deutschmn
Copy link
Contributor Author

Training, saving, and loading work now. I'll start a proper training with more data soon to check out whether I get reasonable results 😊

@@ -485,3 +485,67 @@ def convert_from_transformers(
bi_adaptive_model.connect_heads_with_processor(processor.tasks) # type: ignore

return bi_adaptive_model


class BiAdaptiveSharedModel(BiAdaptiveModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this, you are using the same language model as an encoder. So instead of the original DPR architecture where we have a question and a document encoder, you are using the same encoder for both.

I'd argue that this is the same architecture as used by sentence transformers (https://sbert.net/docs/training/overview.html) and maybe we should consider if it makes sense to re-implement all of the functionality already provided there or if it makes sense to use sentence transformers under the hood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this, you are using the same language model as an encoder. So instead of the original DPR architecture where we have a question and a document encoder, you are using the same encoder for both.

Exactly. But, most importantly, MDR also uses a different retrieval mechanism that iteratively adds context documents (see https://ar5iv.labs.arxiv.org/html/2009.12756#S2.SS2.SSS0.Px1).

I'd argue that this is the same architecture as used by sentence transformers (https://sbert.net/docs/training/overview.html) and maybe we should consider if it makes sense to re-implement all of the functionality already provided there or if it makes sense to use sentence transformers under the hood.

Yes, I agree that it's quite similar to sentence transformers, except for the multi-hop part, which they don't support AFAIK.

For now, I based my implementation on Haystack's DensePassageRetriever, which uses Hugging Face's DPR, and then calls the document store's query_by_embedding. It took only a few changes to adapt this for MDR. I'm not sure if there is much to be gained when using sentence transformers. What you see in my changes now is all there was to do. But please let me know if you find parts that can be simplified by using sentence transformers 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the BiAdaptiveSharedModel would not be needed if an EmbeddingRetriever would be used inside the MDR. The EmbeddingRetriever would just be used to create the embeddings and the retrieval can be custom as required in MDR. I'd also argue that having the train method on the MDR is not intuitive because there isn't anything specific to MDR in the training code and it is just the same training as used in sentence transformers.

I think this would reduce the code that is needed quite a bit. At the same time maybe someone from the team could chime in (@julian-risch ?) If it makes sense to replicate sentence-transformers within FARM architecture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good point, thanks. Frankly, I didn't have a look at EmbeddingRetriever since I thought the closest to MDR would be DPR. Therefore, I duplicated and adapted its code - Hugging Face style. I built BiAdaptiveSharedModel so that MDR doesn't differ from DPR too much.

However, if the goal is to have less duplicated code, deriving from EmbeddingRetriever and overriding retrieve/retrieve_batch might be a good idea. Where do you propose we then implement the training? It would probably make sense to have that not only for MDR but every EmbeddingRetriever.

@deutschmn
Copy link
Contributor Author

Short update on the training: The current implementation does DPR-style training, which is, as @mathislucka has correctly remarked, equivalent to what sentence transformers do if the encoder is shared. However, in the MDR paper and reference implementation, the authors use a different way that accounts for the multi-hop retrieval method by incorporating both hops into the loss function: https://github.com/facebookresearch/multihop_dense_retrieval/blob/62eb2427e36a648a927c6e39bb4c748796f7b366/mdr/retrieval/criterions.py#L114.

I've checked the results of the current method, and they seem suboptimal. I believe the retriever focuses too much on the first context passage if it's not explicitly taught not to do that. Apart from that, if we integrate MDR training into Haystack, it should be what was presented by the original authors, right?

My next steps will be to a) try to use Facebook's checkpoints or b) train a model with the reference implementation, push the weights to Hugging Face and load it into Haystack. If this is successful, we can move to adding training support.

What do you think? 😊

@deutschmn deutschmn changed the title WIP: Add support for Multi-Hop Dense Retrieval Add support for Multi-Hop Dense Retrieval May 23, 2022
@deutschmn
Copy link
Contributor Author

I've now successfully loaded one of Facebook's checkpoints into Haystack, pushed it to the HF Hub and made it the default model for MDR: deutschmann/mdr_roberta_q_encoder. Using the DPR checkpoint facebook/dpr-ctx_encoder-single-nq-base also works, but it hasn't been specifically trained for iterative retrieval. It might be interesting to run the benchmarks tests from your site to see how the model performs in comparison to DPR 😊

I've removed the DPR-style training code and decided to wait for your feedback before continuing. Maybe it would be a good idea to add training in another PR since it's likely not completely trivial to implement.

That's it for my first draft of this PR. It would be great if some of the maintainers could look into what I've done so far (@ZanSara, @julian-risch, @bogdankostic) 😊 Maybe you could also weigh in on the discussion regarding BiAdaptiveSharedModel I'm having with @mathislucka. It might come down to whether MDR training should also be implemented here or not.

@bogdankostic
Copy link
Contributor

Hey @deutschmn, sorry for the late reply! 🙃 I will look at the details of your PR and provide some feedback tomorrow.

@JulianGerhard21
Copy link

Hi @deutschmn ,

I am really interested in this PR and want to test it aswell. Could you please briefly describe which steps are necessary after merging the PR locally?

Kind regards
Julian

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, looks very clean!

Not sure whether we should introduce the BiAdaptiveSharedModel or try using the EmbeddingRetriever instead (as @mathislucka proposed). What do you think @julian-risch?


def retrieve_batch(
self,
queries: Union[str, List[str]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #2575, we changed the type of the queries param in the batch methods to List[str].

@julian-risch
Copy link
Member

I would prefer re-using EmbeddingRetriever instead of having the new BiAdaptiveSharedModel with some code duplication (what @mathislucka wrote). That will require a bit additional work in this PR but it will help us a lot in the long run when we maintain that code. I also think that it will make the code a bit easier to understand for new users.

@deutschmn
Copy link
Contributor Author

Thanks for your feedback, @bogdankostic and @julian-risch! I'll look into it in the next couple of days.

@JulianGerhard21 Happy to hear that you're interested! Here's a notebook showing how you can use MultihopDenseRetriever with a multi-hop example where it gives a better result than DensePassageRetriever: https://gist.github.com/deutschmn/40d3960c4f9a8dc6fa699460d3c56c05

@masci masci linked an issue Jun 22, 2022 that may be closed by this pull request
@julian-risch
Copy link
Member

Hi @deutschmn how is it going? 🙂 We are planning a Haystack release for in about ten days (first week of July). It would be great to have Multi-Hop Dense Retrieval in there as a feature! Do you think you can incorporate the change requests until then so that we can merge your PR?

@deutschmn
Copy link
Contributor Author

Hey @julian-risch! Sorry, I was pretty busy the last couple of days. I'll start incorporating your feedback today or tomorrow so the feature can make the cut for the release 😊

@deutschmn
Copy link
Contributor Author

deutschmn commented Jun 23, 2022

Alright, MultihopDenseRetriever now derives from EmbeddingRetriever and inherits most of its behaviour, except retrieve and retrieve_batch.

I also

@julian-risch julian-risch requested a review from bogdankostic July 4, 2022 07:20
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking already pretty good to me, just left a few comments wrt naming, docstrings and filters and one question wrt to changing the signature of the forward method.

@@ -1898,3 +1898,303 @@ def save(self, save_dir: Union[Path, str]) -> None:
:type save_dir: Union[Path, str]
"""
self.embedding_encoder.save(save_dir=save_dir)


class MultihopDenseRetriever(EmbeddingRetriever):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose renaming this to MultihopEmbeddingRetriever, this might make it more clear that this is an extension of the EmbeddingRetriever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

embed_meta_fields: List[str] = [],
):
"""
Same parameters as `EmbeddingRetriever` except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see descriptions for all the parameters that can be set in the init method. Feel free to just copy them from the EmbeddingRetriever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

" as queries or a single filter that will be applied to each query."
)
else:
filters = [{}] * len(queries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be changed. If the user provides a single filter dict, it will be overwritten by an empty filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -971,7 +971,9 @@ def get_similarity_function(self):
f"The similarity function can only be 'dot_product' or 'cosine', not '{self.similarity_function}'"
)

def forward(self, query_vectors: torch.Tensor, passage_vectors: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
def forward(
self, query_vectors: torch.Tensor, passage_vectors: Optional[torch.Tensor] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe explain why this change is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change because I discovered that if a BiAdaptiveModel is instantiated with a TextSimilarityHead as one of its prediction_heads, the head can be called with None as the second forward parameter here:

embedding1, embedding2 = head(output1, output2)

I added the default parameter (= None) because it used to be necessary for my BiAdaptiveSharedModel.

However, as it doesn't really have anything to do with this PR anymore, I reverted the change now 😊

@deutschmn deutschmn requested a review from bogdankostic July 5, 2022 06:53
@bogdankostic
Copy link
Contributor

Thanks for implementing the requested changes so fast! Could you merge the current master into your branch? In the last days, we did a few fixes wrt to our CI and deprecations from other libraries we use in Haystack.

@deutschmn
Copy link
Contributor Author

@bogdankostic Done 😊

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me! Thank you so much for your contribution to Haystack :)

@bogdankostic bogdankostic merged commit 1db3fd0 into deepset-ai:master Jul 5, 2022
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* Implement MDR

* Adapt conftest to new MDR signature

* Update Documentation & Code Style

* Change signature of queries param in batch methods of MDR like in deepset-ai#2575

* Update Documentation & Code Style

* Rename MultihopDenseRetriever to MultihopEmbeddingRetriever

* Fix filters in retrieve_batch

* Add docstring for MultihopEmbeddingRetriever.__init__

* Update Documentation & Code Style

* Revert forward signature of TextSimilarityHead

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@ss2342
Copy link

ss2342 commented Aug 15, 2023

@deutschmn @bogdankostic hi! Is there any documentation I can refer to about the MultiHopDenseRetriever and how to use it? Thanks!

@bogdankostic
Copy link
Contributor

Hi @Ssingh1997! We have a short passage about the MultihopEmbeddingRetriever in our documentation: https://docs.haystack.deepset.ai/docs/retriever#multihop-embedding-retriever

@chan2k
Copy link

chan2k commented Oct 10, 2023

@bogdankostic Is there documentation for MultiHopDenseRetriever on the parameters for it and how to set number of iterations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:retriever type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Multi-Hop Dense Retrieval
10 participants