Skip to content

Commit

Permalink
Merge pull request #540 from vhch/main
Browse files Browse the repository at this point in the history
Fix typo
  • Loading branch information
merveenoyan authored Nov 22, 2023
2 parents 1447b5b + cfc456b commit b004f50
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion chapters/en/chapter6/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Let's take the example we used during training, with the three merge rules learn
("h", "ug") -> "hug"
```

The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"hu"` and `"g"` being merged.
The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"h"` and `"ug"` being merged.

<Tip>

Expand Down

0 comments on commit b004f50

Please sign in to comment.