Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update document scores based on ranker node #2048

Merged
merged 13 commits into from
Jun 27, 2022
Merged

Conversation

mathislucka
Copy link
Member

@mathislucka mathislucka commented Jan 21, 2022

Proposed changes:
This adds a score to Documents coming from a Ranker.

Cross-Encoders produce scores and as a Document already has a score property, this property can be used safely.
The score can be useful to show in a UI or for other purposes.
As nothing in the sorting code is changed, this should be a non-breaking change.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

closes #2706

@tholor tholor requested a review from julian-risch February 2, 2022 13:13
@tholor tholor marked this pull request as ready for review February 2, 2022 13:14
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case for this new feature. At least to check that documents have scores between 0 and 1 in both cases (logits_dim==1 and logits_dim>1)

@mathislucka
Copy link
Member Author

Hey @julian-risch,

I thought about this a little more and I came up with a few issues:

  1. for logits_dim > 1 the score will not be between 0 and 1 because we are using torch.identity here. It is done the same way in sentence_transformers here:

https://github.com/UKPLab/sentence-transformers/blob/5f2c41821c722a3171e2c7c55620ed1f97050b16/sentence_transformers/cross_encoder/CrossEncoder.py#L36

And I think that it will actually break things if we apply sigmoid for multiple labels.

Thinking further about this, I'd still like to change this and add a activation_fct parameter like in the sentence transformers example. For search, it is good when the reranker score returns a score between 0 and 1 but if we want to use a re-ranker for data augmentation we actually want the raw scores (at least this was also used in the GPL paper). So I'd change the behaviour and function signature to match the functionality that we already know from sentence_transformers and add tests for that.

What do you think?

@julian-risch
Copy link
Member

Hi @mathislucka thanks for bringing this up again.

And I think that it will actually break things if we apply sigmoid for multiple labels.
In that case, I agree that we can use the raw scores internally. Just keep in mind that, as you wrote, search applications need to be able to show a document relevance score (0 to 1) in the end. 👍

@tstadel
Copy link
Member

tstadel commented Mar 30, 2022

@mathislucka I remember we had a brief discussion on this. Are you still working on that? What's the status?

@julian-risch
Copy link
Member

@mathislucka could you please check the status here? Thank you. 🙂

@mathislucka
Copy link
Member Author

Hey @julian-risch and @tstadel ,

I'm not sure about the status. For things like GPL we will need the raw score. For an end-user using sigmoid activation will be most comprehensible. I'd like to have the current code as default behaviour but allow the user to pass in a custom callable which will be called to transform the score. The problem is that as @tstadel mentioned passing in a Callable does not work well with loading nodes from yaml. So I'm not sure what to do here. Any ideas?

@tstadel
Copy link
Member

tstadel commented Apr 28, 2022

@mathislucka @julian-risch how about having just a single activation function self.activation_function and inferring the default from the Transformers models during init by checking self.transformer_model.classifier.out_features. Additionally we should introduce another init param scale_score_to_probability analogous to Retrievers (implemented in #2454) with which we can disable the default scaling in the single dim case using sigmoid.
For custom activation functions, you would be able to set a custom function to ranker.activation_function after calling init. Or if you want/need to use YAMLs you could set scale_score_to_probability to False in order to get the raw scores and pipe the results through a custom node that takes care of the scaling.
What do you think?

@julian-risch
Copy link
Member

@mathislucka What do you think of the suggestion by Thomas? Would you have time to work on implementing that or should I do that?

@mathislucka
Copy link
Member Author

I like the idea and I could implement it. I won't get to it today though. So if you want to merge this earlier I'm glad if you could take over.

@julian-risch
Copy link
Member

@mathislucka it's not urgent. no need to tackle it today. We were just unsure in our sprint planning whether you can continue working on the issue or whether somebody from the core team needs to take over. Looking forward to your implementation of the idea then and let me know if you need any support. 👍

@mathislucka
Copy link
Member Author

mathislucka commented Jun 8, 2022

Just a quick question for @tstadel: When you say infer from self.transformer_model.classifier.out_features do you mean something like self.transformer_model.num_labels? Because I can't find out features in e.g. BERTForSequenceClassification as implemented here:https://github.com/huggingface/transformers/blob/264128cb9dbd83b666666945fd2fea0662135911/src/transformers/models/bert/modeling_bert.py#L1513

@tstadel
Copy link
Member

tstadel commented Jun 8, 2022

Just a quick question for @tstadel: When you say infer from self.transformer_model.classifier.out_features do you mean something like self.transformer_model.num_labels? Because I can't find out features in e.g. BERTForSequenceClassification as implemented here:https://github.com/huggingface/transformers/blob/264128cb9dbd83b666666945fd2fea0662135911/src/transformers/models/bert/modeling_bert.py#L1513

@mathislucka num_labels contains the same information. transformer_model.classifier is a pytorch nn.Linear module which has in turn the out_features field. The first seems to be a convenience field within the transformers model. The latter is the raw dimensionality of the classification head. They should be both safe to use.

@mathislucka
Copy link
Member Author

I think it might work that way, but I am not sure how to fix this mypy issue.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 I fixed the mypy issues and slightly adapted the tests. Let's wait for them to run through and if all goes well I'll merge afterward.

@julian-risch julian-risch changed the title ranker should return scores for later usage Update document scores based on ranker node Jun 27, 2022
@julian-risch julian-risch merged commit 8d65bc5 into master Jun 27, 2022
@julian-risch julian-risch deleted the add_score_to_ranker branch June 27, 2022 10:17
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* ranker should return scores for later usage

* fix wrong tuple order

* adjust ranker scores; add tests

* Update Documentation & Code Style

* fix mypy

* Update Documentation & Code Style

* fix mypy

* Update Documentation & Code Style

* relax ranker test tolerance

* update ranker test score

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:metadata type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ranker scores shouldn't be discarded
4 participants