Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting batch size in SetFitModel.predict #443

Merged
merged 2 commits into from
Nov 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
title: Hyperparameter Optimization
- local: how_to/knowledge_distillation
title: Knowledge Distillation
- local: how_to/batch_sizes
title: Batch Sizes
- local: how_to/absa
title: Aspect Based Sentiment Analysis
- local: how_to/v1.0.0_migration_guide
Expand Down
21 changes: 21 additions & 0 deletions docs/source/en/how_to/batch_sizes.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

# Batch sizes
In this how-to guide we will explore the effects of increasing the batch sizes in [`SetFitModel.predict`].

## What are they?
When processing on GPUs, often times not all data fits on the GPU its VRAM at once. As a result, the data gets split up into **batches** of some often pre-determined batch size. This is done both during training and during inference. In both scenarios, increasing the batch size often has notable consequences to processing efficiency and VRAM memory usage, as transferring data to and from the GPU can be relatively slow.

For inference, it is often recommended to set the batch size high to get notably quicker processing speeds.

## In SetFit
The batch size for inference in SetFit is set to 32, but it can be affected by passing a `batch_size` argument to [`SetFitModel.predict`]. For example, on a RTX 3090 with a SetFit model based on the [paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) Sentence Transformer, the following throughputs are reached:

![setfit_speed_per_batch_size](https://github.com/huggingface/setfit/assets/37621491/c01d391b-aeba-4a4b-83f8-b09970a0d6e6)

<Tip>

Each sentence consists of 11 words in this experiment.

</Tip>

The default batch size of 32 does not result in the highest possible throughput on this hardware. Consider experimenting with the batch size to reach your highest possible throughput.
29 changes: 22 additions & 7 deletions src/setfit/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -432,11 +432,15 @@ def _freeze_or_not(self, model: nn.Module, to_freeze: bool) -> None:
for param in model.parameters():
param.requires_grad = not to_freeze

def encode(self, inputs: List[str], show_progress_bar: Optional[bool] = None) -> Union[torch.Tensor, np.ndarray]:
def encode(
self, inputs: List[str], batch_size: int = 32, show_progress_bar: Optional[bool] = None
) -> Union[torch.Tensor, np.ndarray]:
"""Convert input sentences to embeddings using the `SentenceTransformer` body.

Args:
inputs (`List[str]`): The input sentences to embed.
batch_size (`int`, defaults to `32`): The batch size to use in encoding the sentences to embeddings.
Higher often means faster processing but higher memory usage.
show_progress_bar (`Optional[bool]`, defaults to `None`): Whether to show a progress bar while encoding.

Returns:
Expand All @@ -445,6 +449,7 @@ def encode(self, inputs: List[str], show_progress_bar: Optional[bool] = None) ->
"""
return self.model_body.encode(
inputs,
batch_size=batch_size,
normalize_embeddings=self.normalize_embeddings,
convert_to_tensor=self.has_differentiable_head,
show_progress_bar=show_progress_bar,
Expand Down Expand Up @@ -472,12 +477,14 @@ def _output_type_conversion(
return outputs

def predict(
self, inputs: List[str], as_numpy: bool = False, show_progress_bar: Optional[bool] = None
self, inputs: List[str], batch_size: int = 32, as_numpy: bool = False, show_progress_bar: Optional[bool] = None
) -> Union[torch.Tensor, np.ndarray]:
"""Predict the various classes.

Args:
inputs (`List[str]`): The input sentences to predict classes for.
batch_size (`int`, defaults to `32`): The batch size to use in encoding the sentences to embeddings.
Higher often means faster processing but higher memory usage.
as_numpy (`bool`, defaults to `False`): Whether to output as numpy array instead.
show_progress_bar (`Optional[bool]`, defaults to `None`): Whether to show a progress bar while encoding.

Expand All @@ -490,17 +497,19 @@ def predict(
`Union[torch.Tensor, np.ndarray]`: A vector with equal length to the inputs, denoting
to which class each input is predicted to belong.
"""
embeddings = self.encode(inputs, show_progress_bar=show_progress_bar)
embeddings = self.encode(inputs, batch_size=batch_size, show_progress_bar=show_progress_bar)
outputs = self.model_head.predict(embeddings)
return self._output_type_conversion(outputs, as_numpy=as_numpy)

def predict_proba(
self, inputs: List[str], as_numpy: bool = False, show_progress_bar: Optional[bool] = None
self, inputs: List[str], batch_size: int = 32, as_numpy: bool = False, show_progress_bar: Optional[bool] = None
) -> Union[torch.Tensor, np.ndarray]:
"""Predict the probabilities of the various classes.

Args:
inputs (`List[str]`): The input sentences to predict class probabilities for.
batch_size (`int`, defaults to `32`): The batch size to use in encoding the sentences to embeddings.
Higher often means faster processing but higher memory usage.
as_numpy (`bool`, defaults to `False`): Whether to output as numpy array instead.
show_progress_bar (`Optional[bool]`, defaults to `None`): Whether to show a progress bar while encoding.

Expand All @@ -515,7 +524,7 @@ def predict_proba(
`Union[torch.Tensor, np.ndarray]`: A matrix with shape [INPUT_LENGTH, NUM_CLASSES] denoting
probabilities of predicting an input as a class.
"""
embeddings = self.encode(inputs, show_progress_bar=show_progress_bar)
embeddings = self.encode(inputs, batch_size=batch_size, show_progress_bar=show_progress_bar)
outputs = self.model_head.predict_proba(embeddings)
return self._output_type_conversion(outputs, as_numpy=as_numpy)

Expand Down Expand Up @@ -574,11 +583,17 @@ def create_model_card(self, path: str, model_name: Optional[str] = "SetFit Model
with open(os.path.join(path, "README.md"), "w", encoding="utf-8") as f:
f.write(model_card_content)

def __call__(self, inputs: List[str]) -> torch.Tensor:
def __call__(
self, inputs: List[str], batch_size: int = 32, as_numpy: bool = False, show_progress_bar: Optional[bool] = None
) -> Union[torch.Tensor, np.ndarray]:
"""Predict the various classes.

Args:
inputs (`List[str]`): The input sentences to predict classes for.
batch_size (`int`, defaults to `32`): The batch size to use in encoding the sentences to embeddings.
Higher often means faster processing but higher memory usage.
as_numpy (`bool`, defaults to `False`): Whether to output as numpy array instead.
show_progress_bar (`Optional[bool]`, defaults to `None`): Whether to show a progress bar while encoding.

Example:
>>> model = SetFitModel.from_pretrained(...)
Expand All @@ -589,7 +604,7 @@ def __call__(self, inputs: List[str]) -> torch.Tensor:
`torch.Tensor`: A vector with equal length to the inputs, denoting to which class each
input is predicted to belong.
"""
return self.predict(inputs)
return self.predict(inputs, batch_size=batch_size, as_numpy=as_numpy, show_progress_bar=show_progress_bar)

def _save_pretrained(self, save_directory: Union[Path, str]) -> None:
save_directory = str(save_directory)
Expand Down
Loading