You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
Model Series
Qwen2.5
What are the models used?
Qwen/Qwen2.5-3B-Instruct
What is the scenario where the problem happened?
transformers
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
OS: Ubuntu 22.04
Python: Python 3.10
GPUs: 8 x NVIDIA A100
NVIDIA driver: 535 (from nvidia-smi)
CUDA compiler: 12.1 (from nvcc -V)
PyTorch: 2.2.1+cu121 (from python -c "import troch; print(torch.version)")
Description
Steps to reproduce
import torch
from transformers import AutoTokenizer
加载 tokenizer
model_name_or_path = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
例子文本
text = " 那明天呢?"
检查每个 token
tokens = tokenizer.tokenize(text)
print("Tokenized Text:", tokens)
转换成 token ID
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Token IDs:", token_ids)
解码回文本
decoded_text = tokenizer.decode(token_ids)
print("Decoded Text:", decoded_text)
Encoded Text: " 那明天呢?"
Tokenized Text: ['Ġé', 'Ĥ', '£', 'æĺİ天', 'åij¢', '?']
Token IDs: [18137, 224, 96, 104807, 101036, 30]
Decoded Text: 那明天呢?
The text was updated successfully, but these errors were encountered: