-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: Invalid token: 51865 #85
Comments
For a multilingual model like timestampTokensEnd: 50364 + 1501 // exactly 51865 Then the validation is: isValidToken(token: number) {
return token < this.tokenConfig.timestampTokensEnd
} Meaning that I've never seen a token with a value of It's possible that my assumption wasn't correct here. I'm not sure. In isValidToken(token: number) {
return token <= this.tokenConfig.timestampTokensEnd
} I'd rather still error on larger values, since the error reporting can be helpful to find bugs and other odd edge cases. |
Amazing, thanks @rotemdan. I'll make a new Storyteller release and confirm that this fixes the issue. Erroring on larger values makes sense to me; I'll report back if we see that popping up. |
The reference Python implementation bounds the special token range like this: specials = [
"<|endoftext|>",
"<|startoftranscript|>",
*[f"<|{lang}|>" for lang in list(LANGUAGES.keys())[:num_languages]],
"<|translate|>",
"<|transcribe|>",
"<|startoflm|>",
"<|startofprev|>",
"<|nospeech|>",
"<|notimestamps|>",
*[f"<|{i * 0.02:.2f}|>" for i in range(1501)],
] In In my implementation, this token is not in the range of valid tokens and should never be output. I verified the ONNX model returns 51865 elements in the logit vector (indexed 0 to 51864) for multilingual and 51864 elements (indexes 0 to 51863) for English only models. The out-of-range token seems to be received from the output of |
Happy to open an issue there (those folks are quite a lot less responsive than you are, unfortunately; I've still never heard back about the other token issue I opened months ago) and see if they have any thoughts! |
echogarden-project/echogarden#85 for details! Changelog: fixed
If I open the ONNX models using Netron (an ONNX model viewer), the output tensor has a fixed length for its last axis, of 51865 for And 51864 for But however, I just realized that in the Very surprising. I'm not sure what the last token means. Since the turbo model was added to the reference Python implementation pretty silently, this fact isn't may not be well documented. I'll need to go into the Python code to find any explanation for this. |
Ah, woah! That explains why folks are only seeing this on large-v3-turbo, I guess |
I looked also at the ONNX graph for I haven't seen any mention in the Python source code to what that means. Web searches didn't bring anything. This discussion just confirmed the what I said before: Even |
🤷 well, the good news is that your patch worked, and Storyteller is working with the turbo model now! |
We've got a few Storyteller users reporting this error (sorry about the mangled stack trace, recently had to start bundling parts of echogarden to reduce Storyteller's image size):
I believe all users reporting this error are using a CUDA (12.6) build of whisper.cpp, and are using the large-v3-turbo model. The invalid token is always
51865
, which one user noted 1 token past the vocabulary size of whisper.A little bit more info from one user:
Let me know if it would be helpful to share the offending audio files!
The text was updated successfully, but these errors were encountered: