Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Send batches of query-doc pairs to inference_from_objects #5125

Merged
merged 5 commits into from
Jun 26, 2023

Conversation

bogdankostic
Copy link
Contributor

@bogdankostic bogdankostic commented Jun 12, 2023

Related Issues

  • n/a

Proposed Changes:

This PR adds the parameter preprocessing_batch_size to FARMReader, which is used to limit the number of query-doc pairs passed to the QAInferencer's inference_from_objects method. This is needed because when passing all query-doc pairs to inference_from_objects, the QAInferencer will tokenize and featurize all inputs which might cause the RAM to go out of memory if the number of inputs is large.

How did you test it?

I added two unit tests.

Notes for the reviewer

I moved get_batches_from_generator from haystack.document_stores.base to haystack.utils.batching as this utility method is not only useful for document stores.

Checklist

@coveralls
Copy link
Collaborator

coveralls commented Jun 12, 2023

Pull Request Test Coverage Report for Build 5353938751

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 199 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.5%) to 43.056%

Files with Coverage Reduction New Missed Lines %
document_stores/base.py 62 44.76%
nodes/reader/farm.py 137 39.9%
Totals Coverage Status
Change from base Build 5353925852: 0.5%
Covered Lines: 9688
Relevant Lines: 22501

💛 - Coveralls

@github-actions github-actions bot added the type:documentation Improvements on the docs label Jun 12, 2023
@bogdankostic bogdankostic marked this pull request as ready for review June 12, 2023 16:10
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to say, LGTM. Thanks for the "notes for the reviewer" helping me understand why we moved the method under utils.

@bogdankostic bogdankostic merged commit 82291b5 into main Jun 26, 2023
@bogdankostic bogdankostic deleted the reader_batching branch June 26, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants