fix: Send batches of query-doc pairs to inference_from_objects #5125

bogdankostic · 2023-06-12T15:07:26Z

Related Issues

n/a

Proposed Changes:

This PR adds the parameter preprocessing_batch_size to FARMReader, which is used to limit the number of query-doc pairs passed to the QAInferencer's inference_from_objects method. This is needed because when passing all query-doc pairs to inference_from_objects, the QAInferencer will tokenize and featurize all inputs which might cause the RAM to go out of memory if the number of inputs is large.

How did you test it?

I added two unit tests.

Notes for the reviewer

I moved get_batches_from_generator from haystack.document_stores.base to haystack.utils.batching as this utility method is not only useful for document stores.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2023-06-12T15:23:51Z

Pull Request Test Coverage Report for Build 5353938751

0 of 0 changed or added relevant lines in 0 files are covered.
199 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.5%) to 43.056%

Files with Coverage Reduction	New Missed Lines	%
document_stores/base.py	62	44.76%
nodes/reader/farm.py	137	39.9%

Totals
Change from base Build 5353925852:	0.5%
Covered Lines:	9688
Relevant Lines:	22501

💛 - Coveralls

masci

Nothing to say, LGTM. Thanks for the "notes for the reviewer" helping me understand why we moved the method under utils.

bogdankostic added 2 commits June 12, 2023 17:02

Send batches of query-doc pairs to inference_from_objects

d907c39

Use absolute import path

9e2700a

bogdankostic requested a review from a team as a code owner June 12, 2023 15:07

bogdankostic requested review from masci and removed request for a team June 12, 2023 15:07

github-actions bot added topic:document_store topic:elasticsearch topic:faiss topic:opensearch topic:reader topic:tests topic:weaviate labels Jun 12, 2023

bogdankostic marked this pull request as draft June 12, 2023 15:13

Add separate preprocessing_batch_size parameter

b71904c

github-actions bot added the type:documentation Improvements on the docs label Jun 12, 2023

bogdankostic marked this pull request as ready for review June 12, 2023 16:10

bogdankostic added 2 commits June 20, 2023 09:27

Merge branch 'main' into reader_batching

5a650f4

Merge branch 'main' into reader_batching

46b0339

masci approved these changes Jun 26, 2023

View reviewed changes

bogdankostic merged commit 82291b5 into main Jun 26, 2023

bogdankostic deleted the reader_batching branch June 26, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Send batches of query-doc pairs to inference_from_objects #5125

fix: Send batches of query-doc pairs to inference_from_objects #5125

bogdankostic commented Jun 12, 2023 •

edited

Loading

coveralls commented Jun 12, 2023 •

edited

Loading

masci left a comment

fix: Send batches of query-doc pairs to inference_from_objects #5125

fix: Send batches of query-doc pairs to inference_from_objects #5125

Conversation

bogdankostic commented Jun 12, 2023 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Jun 12, 2023 • edited Loading

Pull Request Test Coverage Report for Build 5353938751

💛 - Coveralls

masci left a comment

Choose a reason for hiding this comment

bogdankostic commented Jun 12, 2023 •

edited

Loading

coveralls commented Jun 12, 2023 •

edited

Loading