You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that the decode() method in the GPT4Tokenizer class does not handle special tokens. I submitted a pull request (#63) with some updated code, but also wanted to post the issue here. Here is the original code for reference:
def decode(self, ids):
# we have to un-permute the bytes before we decode
text_bytes = b"".join(self.vocab[idx] for idx in ids)
text_bytes = bytes(self.inverse_byte_shuffle[b] for b in text_bytes)
text = text_bytes.decode("utf-8", errors="replace")
return text
The text was updated successfully, but these errors were encountered:
It appears that the decode() method in the GPT4Tokenizer class does not handle special tokens. I submitted a pull request (#63) with some updated code, but also wanted to post the issue here. Here is the original code for reference:
The text was updated successfully, but these errors were encountered: