Skip to content

Commit

Permalink
Fix typo (#141)
Browse files Browse the repository at this point in the history
Remove a duplicate `the`
  • Loading branch information
st3w4r authored Apr 25, 2022
1 parent c681cdd commit 94b71a4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion chapters/en/chapter1/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ The original Transformer architecture looked like this, with the encoder on the
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers-dark.svg" alt="Architecture of a Transformers models">
</div>

Note that the the first attention layer in a decoder block pays attention to all (past) inputs to the decoder, but the second attention layer uses the output of the encoder. It can thus access the whole input sentence to best predict the current word. This is very useful as different languages can have grammatical rules that put the words in different orders, or some context provided later in the sentence may be helpful to determine the best translation of a given word.
Note that the first attention layer in a decoder block pays attention to all (past) inputs to the decoder, but the second attention layer uses the output of the encoder. It can thus access the whole input sentence to best predict the current word. This is very useful as different languages can have grammatical rules that put the words in different orders, or some context provided later in the sentence may be helpful to determine the best translation of a given word.

The *attention mask* can also be used in the encoder/decoder to prevent the model from paying attention to some special words -- for instance, the special padding word used to make all the inputs the same length when batching together sentences.

Expand Down

0 comments on commit 94b71a4

Please sign in to comment.