Skip to content

Commit

Permalink
Update bark doc (huggingface#25234)
Browse files Browse the repository at this point in the history
* add mention to optimization in Bark docs

* add offload mention in docs

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <[email protected]>

* Update bark docs.

* Update bark.md

---------

Co-authored-by: Sanchit Gandhi <[email protected]>
  • Loading branch information
2 people authored and blbadger committed Nov 8, 2023
1 parent 1ae9a60 commit f50866d
Showing 1 changed file with 61 additions and 1 deletion.
62 changes: 61 additions & 1 deletion docs/source/en/model_doc/bark.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,67 @@ Bark is made of 4 main models:

It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.

### Optimizing Bark

### Tips:
Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**.

#### Using half-precision

You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision.

```python
from transformers import BarkModel
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
```

#### Using 🤗 Better Transformer

Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer:

```python
model = model.to_bettertransformer()
```

Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)

#### Using CPU offload

As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.

If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.

```python
model.enable_cpu_offload()
```

Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)

#### Combining optimizaton techniques

You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once.

```python
from transformers import BarkModel
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# load in fp16
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)

# convert to bettertransformer
model = BetterTransformer.transform(model, keep_original_model=False)

# enable CPU offload
model.enable_cpu_offload()
```

Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one).

### Tips

Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings).
Expand Down Expand Up @@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark).

[[autodoc]] BarkModel
- generate
- enable_cpu_offload

## BarkSemanticModel

Expand Down

0 comments on commit f50866d

Please sign in to comment.