-
-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model outputs � in Korean/Chinese #284
Comments
Can you please provide me with a reproduction code? |
@giladgd this is my code. const modelPath = "path/to/model";
const llama = await getLlama();
const model = await llama.loadModel({modelPath: modelPath});
const context = await model.createContext({contextSize: 8192});
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
systemPrompt: "",
});
session.prompt(msg, {
onTextChunk: (chunk: string) => {
console.log(chunk);
},
}); I've tried mradermacher' models. It still outputs �. I'm running |
@bqhuyy Are you sure it's not due to terminal encoding? I've prompted mradermacher's model with |
@giladgd I found the issue. It seems that the problem occurs when outputs
IMG_4139.mp4 |
🎉 This issue has been resolved in version 3.0.0-beta.42 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
@giladgd hi, the problem still occurs when streaming in |
@bqhuyy It seems that the fix I released fixed it only when streaming in the console but not when streaming to other places (probably because split Unicode characters are spliced by the console when printed sequentially). |
🎉 This issue has been resolved in version 3.0.0-beta.43 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
🎉 This PR is included in version 3.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Issue description
Model outputs � in Korean/Chinese
Expected Behavior
Model can output correct Unicode/UTF8 character
Actual Behavior
Model outputs �
Steps to reproduce
This problem occurs when working with Chinese/Korean. I'm using Llama 3.1 - Q4_K_M. It also occurs with Qwen2 models.
My Environment
node-llama-cpp
versionAdditional Context
I've tried to use
onToken
/onTextChunk
function. It still returns same result. I see some related issues: ggml-org/llama.cpp#11 , ggml-org/llama.cpp#79Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, but I don't know how to start. I would need guidance.
The text was updated successfully, but these errors were encountered: