Skip to content

Commit

Permalink
Moved function
Browse files Browse the repository at this point in the history
  • Loading branch information
Pringled committed Oct 31, 2024
1 parent 62ac69c commit 9361e41
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions scripts/export_to_onnx.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,11 @@ def save_tokenizer(tokenizer: Tokenizer, save_directory: Path) -> None:
sep_token="[SEP]",
mask_token="[MASK]",
)
vocab = tokenizer.get_vocab()
vocab_path = save_directory / "vocab.txt"
with open(vocab_path, "w", encoding="utf-8") as vocab_file:
for token in sorted(vocab, key=vocab.get):
vocab_file.write(f"{token}\n")

# Save the tokenizer files
fast_tokenizer.save_pretrained(str(save_directory))
Expand Down

0 comments on commit 9361e41

Please sign in to comment.