Skip to content

Commit

Permalink
Fix typo (#532)
Browse files Browse the repository at this point in the history
  • Loading branch information
jybarnes authored Mar 17, 2023
1 parent d229ff7 commit 0bc7dc0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion chapters/en/chapter6/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ We can see that the tokenizer's special tokens `[CLS]` and `[SEP]` are mapped to

<Tip>

The notion of what a word is is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
The notion of what a word is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.

✏️ **Try it out!** Create a tokenizer from the `bert-base-cased` and `roberta-base` checkpoints and tokenize "81s" with them. What do you observe? What are the word IDs?

Expand Down

0 comments on commit 0bc7dc0

Please sign in to comment.