Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale dot product into probabilities #667

Merged
merged 9 commits into from
Dec 11, 2020
Merged

Scale dot product into probabilities #667

merged 9 commits into from
Dec 11, 2020

Conversation

brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Dec 9, 2020

This PR makes improvements and fixes around the similarity functions used in embedding based retrieval.

A warning will be logged in cases where there is a non-recommended Dense Retrieval Model / similarity function pairing.

Scaling of dot product scores into probabilities is now also supported.

This addresses #661

TODO:

  • Raise warning with bad pairing (ES)
  • Scale dot product properly
  • Add similarity as attribute of DocumentStore mother class
  • Implement both similarity fns in InMemoryDocumentStore
  • Implement dot product in FAISSDocumentStore
  • Implement neither in SQLDocumentStore (But raise warning?)
  • Give each document store its own _create_document_field_map() method
  • Compare numbers across doc stores (To be addressed in Align similarity functions across DocumentStores #672)

@brandenchan brandenchan requested a review from tholor December 9, 2020 11:30
@brandenchan brandenchan self-assigned this Dec 9, 2020
@brandenchan brandenchan changed the title Scale probabilities properly Scale dot product into probabilities Dec 9, 2020
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just one comment on the docs / logging + mypy is still failing ..

docs/_src/usage/usage/retriever.md Show resolved Hide resolved
@brandenchan
Copy link
Contributor Author

Implemented support for each document store and similarity function but the probabilities being returned are inconsistent. Will be addressed by #672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants