From cfc456b85dce4a7d9b81b31094754482a9aae4ff Mon Sep 17 00:00:00 2001 From: sj Date: Tue, 28 Mar 2023 19:43:02 +0900 Subject: [PATCH] Fix typo --- chapters/en/chapter6/5.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/en/chapter6/5.mdx b/chapters/en/chapter6/5.mdx index 291518720..e877653ef 100644 --- a/chapters/en/chapter6/5.mdx +++ b/chapters/en/chapter6/5.mdx @@ -97,7 +97,7 @@ Let's take the example we used during training, with the three merge rules learn ("h", "ug") -> "hug" ``` -The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"hu"` and `"g"` being merged. +The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"h"` and `"ug"` being merged.