diff --git a/docs/source/en/accelerate.md b/docs/source/en/accelerate.md
index 17be77d677a5f7..b0f0e4efe64778 100644
--- a/docs/source/en/accelerate.md
+++ b/docs/source/en/accelerate.md
@@ -133,4 +133,4 @@ accelerate launch train.py
>>> notebook_launcher(training_function)
```
-For more information about 🤗 Accelerate and it's rich features, refer to the [documentation](https://huggingface.co/docs/accelerate).
\ No newline at end of file
+For more information about 🤗 Accelerate and its rich features, refer to the [documentation](https://huggingface.co/docs/accelerate).
diff --git a/docs/source/en/add_new_model.md b/docs/source/en/add_new_model.md
index 4072be6f59ca9b..da60c0a829b097 100644
--- a/docs/source/en/add_new_model.md
+++ b/docs/source/en/add_new_model.md
@@ -361,7 +361,7 @@ We expect that every model added to 🤗 Transformers passes a couple of integra
model and the reimplemented version in 🤗 Transformers have to give the exact same output up to a precision of 0.001!
Since it is normal that the exact same model written in different libraries can give a slightly different output
depending on the library framework, we accept an error tolerance of 1e-3 (0.001). It is not enough if the model gives
-nearly the same output, they have to be the almost identical. Therefore, you will certainly compare the intermediate
+nearly the same output, they have to be almost identical. Therefore, you will certainly compare the intermediate
outputs of the 🤗 Transformers version multiple times against the intermediate outputs of the original implementation of
*brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely
important. Here is some advice is to make your debugging environment as efficient as possible.
diff --git a/docs/source/en/add_tensorflow_model.md b/docs/source/en/add_tensorflow_model.md
index 6efbdee1bf3767..aa885f9d2f2264 100644
--- a/docs/source/en/add_tensorflow_model.md
+++ b/docs/source/en/add_tensorflow_model.md
@@ -56,7 +56,7 @@ you might recall from our [general overview of 🤗 Transformers](add_new_model#
that we are an opinionated bunch - the ease of use of 🤗 Transformers relies on consistent design choices. From
experience, we can tell you a few important things about adding TensorFlow models:
-- Don't reinvent the wheel! More often that not, there are at least two reference implementations you should check: the
+- Don't reinvent the wheel! More often than not, there are at least two reference implementations you should check: the
PyTorch equivalent of the model you are implementing and other TensorFlow models for the same class of problems.
- Great model implementations survive the test of time. This doesn't happen because the code is pretty, but rather
because the code is clear, easy to debug and build upon. If you make the life of the maintainers easy with your
@@ -101,7 +101,7 @@ TensorFlow-related pull request.
**2. Prepare transformers dev environment**
-Having selected the model architecture, open an draft PR to signal your intention to work on it. Follow the
+Having selected the model architecture, open a draft PR to signal your intention to work on it. Follow the
instructions below to set up your environment and open a draft PR.
1. Fork the [repository](https://github.com/huggingface/transformers) by clicking on the 'Fork' button on the
@@ -328,7 +328,7 @@ That's it! 🎉
## Debugging mismatches across ML frameworks 🐛
At some point, when adding a new architecture or when creating TensorFlow weights for an existing architecture, you
-might come across errors compaining about mismatches between PyTorch and TensorFlow. You might even decide to open the
+might come across errors complaining about mismatches between PyTorch and TensorFlow. You might even decide to open the
model architecture code for the two frameworks, and find that they look identical. What's going on? 🤔
First of all, let's talk about why understanding these mismatches matters. Many community members will use 🤗
@@ -351,7 +351,7 @@ ingredient here is patience. Here is our suggested workflow for when you come ac
that you'll have to venture into the source implementation of said instruction. In some cases, you might find an
issue with a reference implementation - don't abstain from opening an issue in the upstream repository.
-In some cases, in dicussion with the 🤗 Transformers team, we might find that the fixing the mismatch is infeasible.
+In some cases, in discussion with the 🤗 Transformers team, we might find that fixing the mismatch is infeasible.
When the mismatch is very small in the output layers of the model (but potentially large in the hidden states), we
might decide to ignore it in favor of distributing the model. The `pt-to-tf` CLI mentioned above has a `--max-error`
flag to override the error message at weight conversion time.
diff --git a/docs/source/en/big_models.md b/docs/source/en/big_models.md
index 4b35126f4d331a..9b57e433176094 100644
--- a/docs/source/en/big_models.md
+++ b/docs/source/en/big_models.md
@@ -23,11 +23,11 @@ from PyTorch is:
2. Load your pretrained weights.
3. Put those pretrained weights in your random model.
-Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you got our of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
+Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you get out of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
-Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instatiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
+Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instantiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
@@ -120,4 +120,4 @@ If you want to directly load such a sharded checkpoint inside a model without us
Sharded checkpoints reduce the memory usage during step 2 of the workflow mentioned above, but in order to use that model in a low memory setting, we recommend leveraging our tools based on the Accelerate library.
-Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)
\ No newline at end of file
+Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)
diff --git a/docs/source/en/community.md b/docs/source/en/community.md
index 74c577567ab6f5..3b652fced5acd3 100644
--- a/docs/source/en/community.md
+++ b/docs/source/en/community.md
@@ -10,7 +10,7 @@ This page regroups resources around 🤗 Transformers developed by the community
| Resource | Description | Author |
|:----------|:-------------|------:|
-| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learnt/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) |
+| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learned/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) |
## Community notebooks:
diff --git a/docs/source/en/create_a_model.md b/docs/source/en/create_a_model.md
index 91dd99e7254491..ba384d437b80d9 100644
--- a/docs/source/en/create_a_model.md
+++ b/docs/source/en/create_a_model.md
@@ -209,7 +209,7 @@ Easily reuse this checkpoint for another task by switching to a different model
The last base class you need before using a model for textual data is a [tokenizer](main_classes/tokenizer) to convert raw text to tensors. There are two types of tokenizers you can use with 🤗 Transformers:
- [`PreTrainedTokenizer`]: a Python implementation of a tokenizer.
-- [`PreTrainedTokenizerFast`]: a tokenizer from our Rust-based [🤗 Tokenizer](https://huggingface.co/docs/tokenizers/python/latest/) library. This tokenizer type is significantly faster - especially during batch tokenization - due to it's Rust implementation. The fast tokenizer also offers additional methods like *offset mapping* which maps tokens to their original words or characters.
+- [`PreTrainedTokenizerFast`]: a tokenizer from our Rust-based [🤗 Tokenizer](https://huggingface.co/docs/tokenizers/python/latest/) library. This tokenizer type is significantly faster - especially during batch tokenization - due to its Rust implementation. The fast tokenizer also offers additional methods like *offset mapping* which maps tokens to their original words or characters.
Both tokenizers support common methods such as encoding and decoding, adding new tokens, and managing special tokens.
diff --git a/docs/source/en/custom_models.md b/docs/source/en/custom_models.md
index 5caedb32ef9b6c..d709772eed065c 100644
--- a/docs/source/en/custom_models.md
+++ b/docs/source/en/custom_models.md
@@ -341,7 +341,7 @@ model. This is different from pushing the code to the Hub in the sense that user
get the custom models (contrarily to automatically downloading the model code from the Hub).
As long as your config has a `model_type` attribute that is different from existing model types, and that your model
-classes have the right `config_class` attributes, you can just add them to the auto classes likes this:
+classes have the right `config_class` attributes, you can just add them to the auto classes like this:
```py
from transformers import AutoConfig, AutoModel, AutoModelForImageClassification
diff --git a/docs/source/en/generation_strategies.md b/docs/source/en/generation_strategies.md
index 519f7ffcd42632..5cdecfacad9dc6 100644
--- a/docs/source/en/generation_strategies.md
+++ b/docs/source/en/generation_strategies.md
@@ -92,7 +92,7 @@ sequences that start with a lower probability initial tokens and would've been i
- `do_sample`: if set to `True`, this parameter enables decoding strategies such as multinomial sampling, beam-search
multinomial sampling, Top-K sampling and Top-p sampling. All these strategies select the next token from the probability
distribution over the entire vocabulary with various strategy-specific adjustments.
-- `num_return_sequences`: the number of sequence candidates to return for each input. This options is only available for
+- `num_return_sequences`: the number of sequence candidates to return for each input. This option is only available for
the decoding strategies that support multiple sequence candidates, e.g. variations of beam search and sampling. Decoding
strategies like greedy search and contrastive search return a single output sequence.
@@ -146,7 +146,7 @@ one for summarization with beam search). You must have the right Hub permissions
## Streaming
-The `generate()` supports streaming, through its `streamer` input. The `streamer` input is compatible any instance
+The `generate()` supports streaming, through its `streamer` input. The `streamer` input is compatible with any instance
from a class that has the following methods: `put()` and `end()`. Internally, `put()` is used to push new tokens and
`end()` is used to flag the end of text generation.
@@ -301,7 +301,7 @@ the `num_beams` greater than 1, and set `do_sample=True` to use this decoding st
The diverse beam search decoding strategy is an extension of the beam search strategy that allows for generating a more diverse
set of beam sequences to choose from. To learn how it works, refer to [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models](https://arxiv.org/pdf/1610.02424.pdf).
This approach has three main parameters: `num_beams`, `num_beam_groups`, and `diversity_penalty`.
-The diversily penalty ensures the outputs are distinct across groups, and beam search is used within each group.
+The diversity penalty ensures the outputs are distinct across groups, and beam search is used within each group.
```python
@@ -367,7 +367,7 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
```
-When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness
+When using assisted decoding with sampling methods, you can use the `temperature` argument to control the randomness
just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
```python
diff --git a/docs/source/en/glossary.md b/docs/source/en/glossary.md
index 08d71ee18b5cdd..ea1e53ab598710 100644
--- a/docs/source/en/glossary.md
+++ b/docs/source/en/glossary.md
@@ -187,7 +187,7 @@ The model head refers to the last layer of a neural network that accepts the raw
### image patch
-Vision-based Transformers models split an image into smaller patches which are linearly embedded, and then passed as a sequence to the model. You can find the `patch_size` - or resolution - of the model in it's configuration.
+Vision-based Transformers models split an image into smaller patches which are linearly embedded, and then passed as a sequence to the model. You can find the `patch_size` - or resolution - of the model in its configuration.
### inference
diff --git a/docs/source/en/llm_tutorial.md b/docs/source/en/llm_tutorial.md
index 31eef527fd54ac..0be71233a20a88 100644
--- a/docs/source/en/llm_tutorial.md
+++ b/docs/source/en/llm_tutorial.md
@@ -27,7 +27,7 @@ This tutorial will show you how to:
* Generate text with an LLM
* Avoid common pitfalls
-* Next steps to help you get the most out your LLM
+* Next steps to help you get the most out of your LLM
Before you begin, make sure you have all the necessary libraries installed:
diff --git a/docs/source/en/model_memory_anatomy.md b/docs/source/en/model_memory_anatomy.md
index 031bf5e66ef1ff..0a0d5bb5b8bf42 100644
--- a/docs/source/en/model_memory_anatomy.md
+++ b/docs/source/en/model_memory_anatomy.md
@@ -184,7 +184,7 @@ GPU memory occupied: 14949 MB.
We see that already a relatively small batch size almost fills up our GPU's entire memory. However, a larger batch size
can often result in faster model convergence or better end performance. So ideally we want to tune the batch size to our
model's needs and not to the GPU limitations. What's interesting is that we use much more memory than the size of the model.
-To understand a bit better why this is the case let's have look at a model's operations and memory needs.
+To understand a bit better why this is the case let's have a look at a model's operations and memory needs.
## Anatomy of Model's Operations
diff --git a/docs/source/en/model_sharing.md b/docs/source/en/model_sharing.md
index 078bc29fe2caec..84d287570da192 100644
--- a/docs/source/en/model_sharing.md
+++ b/docs/source/en/model_sharing.md
@@ -95,7 +95,7 @@ Specify `from_pt=True` to convert a checkpoint from PyTorch to TensorFlow:
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
-Then you can save your new TensorFlow model with it's new checkpoint:
+Then you can save your new TensorFlow model with its new checkpoint:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
@@ -201,7 +201,7 @@ Or perhaps you'd like to add the TensorFlow version of your fine-tuned PyTorch m
>>> tf_model.push_to_hub("my-awesome-model")
```
-Now when you navigate to the your Hugging Face profile, you should see your newly created model repository. Clicking on the **Files** tab will display all the files you've uploaded to the repository.
+Now when you navigate to your Hugging Face profile, you should see your newly created model repository. Clicking on the **Files** tab will display all the files you've uploaded to the repository.
For more details on how to create and upload files to a repository, refer to the Hub documentation [here](https://huggingface.co/docs/hub/how-to-upstream).
diff --git a/docs/source/fr/index.md b/docs/source/fr/index.md
index f18ad8e57c21db..9e3e6eb5c2362c 100644
--- a/docs/source/fr/index.md
+++ b/docs/source/fr/index.md
@@ -25,7 +25,7 @@ Apprentissage automatique de pointe pour [PyTorch](https://pytorch.org/), [Tenso
🗣️ **Audio**: reconnaissance automatique de la parole et classification audio.
🐙 **Multimodalité**: système de question-réponse avec des tableaux ou images, reconnaissance optique de caractères, extraction d'information depuis des documents scannés et classification de vidéo.
-🤗 Transformers prend en charge l'interopérabilité entre PyTorch, TensorFlow et JAX. Cela permet d'utiliser un framework différent à chaque étape de la vie d'un modèle, par example entraîner un modèle en trois lignes de code avec un framework, et le charger pour l'inférence avec un autre. Les modèles peuvent également être exportés dans un format comme ONNX et TorchScript pour être déployés dans des environnements de production.
+🤗 Transformers prend en charge l'interopérabilité entre PyTorch, TensorFlow et JAX. Cela permet d'utiliser un framework différent à chaque étape de la vie d'un modèle, par exemple entraîner un modèle en trois lignes de code avec un framework, et le charger pour l'inférence avec un autre. Les modèles peuvent également être exportés dans un format comme ONNX et TorchScript pour être déployés dans des environnements de production.
Rejoignez la communauté grandissante sur le [Hub](https://huggingface.co/models), le [forum](https://discuss.huggingface.co/) ou [Discord](https://discord.com/invite/JfAtkvEtRb) dès aujourd'hui !
@@ -407,4 +407,4 @@ Le tableau ci-dessous représente la prise en charge actuelle dans la bibliothè
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
-
\ No newline at end of file
+
diff --git a/docs/source/fr/quicktour.md b/docs/source/fr/quicktour.md
index 7a972ec449beb9..666a931f825f1c 100644
--- a/docs/source/fr/quicktour.md
+++ b/docs/source/fr/quicktour.md
@@ -60,7 +60,7 @@ Le [`pipeline`] est le moyen le plus simple d'utiliser un modèle pré-entraîn
| Traduction | Traduit du texte d'un langage à un autre | Texte | pipeline(task="translation") |
| Classification d'image | Attribue une catégorie à une image | Image | pipeline(task="image-classification") |
| Segmentation d'image | Attribue une catégorie à chaque pixel d'une image (supporte la segmentation sémantique, panoptique et d'instance) | Image | pipeline(task="image-segmentation") |
-| Détection d'objects | Prédit les délimitations et catégories d'objects dans une image | Image | pipeline(task="object-detection") |
+| Détection d'objets | Prédit les délimitations et catégories d'objets dans une image | Image | pipeline(task="object-detection") |
| Classification d'audio | Attribue une catégorie à un fichier audio | Audio | pipeline(task="audio-classification") |
| Reconnaissance automatique de la parole | Extrait le discours d'un fichier audio en texte | Audio | pipeline(task="automatic-speech-recognition") |
| Question réponse visuels | Etant données une image et une question, répond correctement à une question sur l'image | Modalités multiples | pipeline(task="vqa") |
@@ -99,7 +99,7 @@ Le [`pipeline`] peut aussi itérer sur un jeu de données entier pour n'importe
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
```
-Chargez un jeu de données audio (voir le 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) pour plus de détails) sur lequel vous souhaitez itérer. Pour cet example, nous chargons le jeu de données [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) :
+Chargez un jeu de données audio (voir le 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) pour plus de détails) sur lequel vous souhaitez itérer. Pour cet exemple, nous chargeons le jeu de données [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) :
```py
>>> from datasets import load_dataset, Audio
@@ -155,7 +155,7 @@ Utilisez [`TFAutoModelForSequenceClassification`] et [`AutoTokenizer`] pour char
-Specifiez le modèle et le tokenizer dans le [`pipeline`], et utilisez le `classifier` sur le texte en français :
+Spécifiez le modèle et le tokenizer dans le [`pipeline`], et utilisez le `classifier` sur le texte en français :
```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
@@ -418,7 +418,7 @@ En fonction de votre tâche, vous passerez généralement les paramètres suivan
>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
-2. [`TrainingArguments`] contient les hyperparamètres du modèle que vous pouvez changer comme le taux d'apprentissage, la taille due l'échantillon, et le nombre d'époques pour s'entraîner. Les valeurs par défaut sont utilisées si vous ne spécifiez pas d'hyperparamètres d'apprentissage :
+2. [`TrainingArguments`] contient les hyperparamètres du modèle que vous pouvez changer comme le taux d'apprentissage, la taille de l'échantillon, et le nombre d'époques pour s'entraîner. Les valeurs par défaut sont utilisées si vous ne spécifiez pas d'hyperparamètres d'apprentissage :
```py
>>> from transformers import TrainingArguments
@@ -547,4 +547,4 @@ Tous les modèles sont des modèles standard [`tf.keras.Model`](https://www.tens
## Et après ?
-Maintenant que vous avez terminé la visite rapide de 🤗 Transformers, consultez nos guides et apprenez à faire des choses plus spécifiques comme créer un modèle personnalisé, finetuner un modèle pour une tâche, et comment entraîner un modèle avec un script. Si vous souhaitez en savoir plus sur les concepts fondamentaux de 🤗 Transformers, jetez un œil à nos guides conceptuels !
\ No newline at end of file
+Maintenant que vous avez terminé la visite rapide de 🤗 Transformers, consultez nos guides et apprenez à faire des choses plus spécifiques comme créer un modèle personnalisé, finetuner un modèle pour une tâche, et comment entraîner un modèle avec un script. Si vous souhaitez en savoir plus sur les concepts fondamentaux de 🤗 Transformers, jetez un œil à nos guides conceptuels !