Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Commit

Permalink
refactor(decoder): tweak comments
Browse files Browse the repository at this point in the history
  • Loading branch information
peakji committed Mar 5, 2023
1 parent 581a34b commit 0291fa5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions basaran/decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ def decode(self, token):
# <unk>, <pad> and other special tokens will be decoded into ''.
text = self.tokenizer.decode(token, skip_special_tokens=True)

# Handle replacement characters caused by Unicode surrogates or
# multi-code-point graphemes like emojis.
# Handle replacement characters caused by multi-byte-pair-encoding or
# Unicode surrogates or multi-code-point graphemes like emojis.
if self.replacement in text:
n = -self.surrogates if self.surrogates > 0 else len(self.buffer)
tokens = self.buffer[n:] + [token]
Expand Down

0 comments on commit 0291fa5

Please sign in to comment.