-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure eval mode for farm and transformer models for predictions #3791
Conversation
…dict or predict_batch. Added test that failed previously.
… unit test for TextToSpeech.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjrl looks good from the outset. Any other pipelines we use in the codebase? Let's not switch those explicitly to eval, as it's already done automatically.
@sjrl wait wait, I don't agree we call the eval() switch on every model invocation. That's excessive; why do that? Take the example of an encoder, we need to encode millions of documents, and yet we'll call eval on every call. I would respectfully disagree. |
Hmm, well considering that Sentence Transformers actually does this for their sentence encoder models already whenever you call the function encode def encode(self, sentences: Union[str, List[str]],
batch_size: int = 32,
show_progress_bar: bool = None,
output_value: str = 'sentence_embedding',
convert_to_numpy: bool = True,
convert_to_tensor: bool = False,
device: str = None,
normalize_embeddings: bool = False) -> Union[List[Tensor], ndarray, Tensor]:
"""
Computes sentence embeddings
:param sentences: the sentences to embed
:param batch_size: the batch size used for the computation
:param show_progress_bar: Output a progress bar when encode sentences
:param output_value: Default sentence_embedding, to get sentence embeddings. Can be set to token_embeddings to get wordpiece token embeddings. Set to None, to get all output values
:param convert_to_numpy: If true, the output is a list of numpy vectors. Else, it is a list of pytorch tensors.
:param convert_to_tensor: If true, you get one large tensor as return. Overwrites any setting from convert_to_numpy
:param device: Which torch.device to use for the computation
:param normalize_embeddings: If set to true, returned vectors will have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used.
:return:
By default, a list of tensors is returned. If convert_to_tensor, a stacked tensor is returned. If convert_to_numpy, a numpy matrix is returned.
"""
self.eval()
... which we call here in Haystack
we are in fact already calling it every time at model invocation time for our EmbeddingRetriever node whenever we use Sentence Transformers (which we use most often in dC) and has not seemed to impact our timings. I don't think looping through all layers in a model takes that long, but I can collect some timings on some of the Flan Models to double check. |
Ok ok @sjrl , I wouldn't want you to waste your time on such tests now. Let's just get a nod from other team members as well. I am convinced by your proofs, but HF also doesn't switch the pipeline to eval on every call. Maybe it's an omission. Let's confirm this change with other team members. cc @mayankjobanputra @julian-risch @bogdankostic |
I don’t have much experience with this but I suppose that setting a model into eval mode shouldn’t be a heavy operation as it only affects a small fraction of layers like Dropout for example. |
I echo Bogdan's comments. If we were a train-heavy framework - let's go for it. But we only train a few components! Are you concerned about some nefarious user actions @sjrl ? What motivates this change? Consistency? What else? |
It looks to me as if |
I like this idea! |
@sjrl what do you think about the ideas ☝️ ? |
This sounds good to me! I'll go ahead and change the PR to do this instead. I'm fairly busy this week so I might not be able to get to it until next week. |
Hey @vblagoje sorry that this took me so long, but it is finished now! I've made the changes as discussed in this PR and it is ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @sjrl , let's resolve this conflict and integrate this one
Related Issues
Proposed Changes:
I added
model.eval()
calls to make sure to set the models in:FARMReader
TransformerReader
EmbeddingRetriever
(both with the_RetribertEmbeddingEncoder
and_DefaultEmbeddingEncoder
)PromptNode
TransformersDocumentClassifier
TransformersTranslator
Text2SparqlRetriever
Text2Speech
EntityExtractor
are set to eval mode when running an inference prediction. Otherwise, currently, if the model is set to train mode the predictions become random due to layers like dropout, and BatchNormalization being present in the underlying model architecture. This is something we already do for some of our nodes like the
DensePassageRetriever
.How did you test it?
I added unit tests that cover:
FARMReader
andTransformerReader
EmbeddingRetriever
Text2Speech
to make sure the nodes provide correct results even if they are set to train mode before running a prediction. I did confirm that these unit tests fail without the new changes.
Notes for the reviewer
predict
function of the FARMReader right after training.SentenceTransformer
model since theSentenceTransformer.encode
function sets the underlying PyTorch model to eval mode within the function call.Checklist