-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLMEval crashing due to force unwrapping of optional in token decode for Gemma #50
Comments
This looks a bit like:
What version (hash) of |
74b9421 |
OK, that is the current head of main and includes that fix. Oh, interesting -- that particular prompt does reproduce it for me! Here is the token in question: (lldb) p model.convertIdToToken(235345)
(String?) nil thought it looks like it is defined in the tokenizer.json:
The problem is that there are two (ish) #'s:
though one of them has some extra unicode, in particular a ZERO WIDTH NO-BREAK SPACE:
It looks like the strings map to the same value on read and the tokenizer model loses the entry for 235345. |
@ViRo3 could install the latest swift-transformers from GH and let us know if the issue persists. Btw this might be a huggingface swift-transformers issue and not MLX. |
I have checked and it persists and is a swift-transformer issue so filed an issue there too. |
I had a thought about how this might be fixed -- we might be able to make something along these lines: struct CodePointString : StringProtocol {
let value: String
static func ==(lhs: CodePointString, rhs: CodePointString) {
// this isn't actually comparable but this is the idea
lhs.utf16 == rhs.utf16
}
func hash(hasher: inout Hasher) {
hasher.combine(value.utf16)
}
...
} Then we load the config as |
st.txt is the crash log.
Model : Gemma 2B Quantized
Prompt : "Write code to boot a raspberry pico"
The text was updated successfully, but these errors were encountered: