fix: Send batches of query-doc pairs to inference_from_objects #5125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
Proposed Changes:
This PR adds the parameter
preprocessing_batch_size
toFARMReader
, which is used to limit the number of query-doc pairs passed to theQAInferencer
'sinference_from_objects
method. This is needed because when passing all query-doc pairs toinference_from_objects
, theQAInferencer
will tokenize and featurize all inputs which might cause the RAM to go out of memory if the number of inputs is large.How did you test it?
I added two unit tests.
Notes for the reviewer
I moved
get_batches_from_generator
fromhaystack.document_stores.base
tohaystack.utils.batching
as this utility method is not only useful for document stores.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.