decode() method in GPT4Tokenizer does not handle special tokens #64

Vakarva · 2024-04-07T21:57:14Z

It appears that the decode() method in the GPT4Tokenizer class does not handle special tokens. I submitted a pull request (#63) with some updated code, but also wanted to post the issue here. Here is the original code for reference:

def decode(self, ids):
  # we have to un-permute the bytes before we decode
  text_bytes = b"".join(self.vocab[idx] for idx in ids)
  text_bytes = bytes(self.inverse_byte_shuffle[b] for b in text_bytes)
  text = text_bytes.decode("utf-8", errors="replace")
  return text

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decode() method in GPT4Tokenizer does not handle special tokens #64

decode() method in GPT4Tokenizer does not handle special tokens #64

Vakarva commented Apr 7, 2024 •

edited

Loading

decode() method in GPT4Tokenizer does not handle special tokens #64

decode() method in GPT4Tokenizer does not handle special tokens #64

Comments

Vakarva commented Apr 7, 2024 • edited Loading

Vakarva commented Apr 7, 2024 •

edited

Loading