ERROR: Invalid token: 51865 #85

smoores-dev · 2024-12-24T15:35:58Z

We've got a few Storyteller users reporting this error (sorry about the mangled stack trace, recently had to start bundling parts of echogarden to reduce Storyteller's image size):

ERROR: Invalid token: 51865
    err: {
      "type": "Error",
      "message": "Invalid token: 51865",
      "stack":
          Error: Invalid token: 51865
              at Whisper.assertIsValidToken (/app/.next/standalone/web/work-dist/worker.cjs:69550:17)
              at /app/.next/standalone/web/work-dist/worker.cjs:69507:41
              at Array.forEach (<anonymous>)
              at Whisper.tokensToText (/app/.next/standalone/web/work-dist/worker.cjs:69507:16)
              at Whisper.tokenToText (/app/.next/standalone/web/work-dist/worker.cjs:69504:21)
              at parseResultObject (/app/.next/standalone/web/work-dist/worker.cjs:67585:33)
              at async ChildProcess.<anonymous> (/app/.next/standalone/web/work-dist/worker.cjs:67533:38)
    }

I believe all users reporting this error are using a CUDA (12.6) build of whisper.cpp, and are using the large-v3-turbo model. The invalid token is always 51865, which one user noted 1 token past the vocabulary size of whisper.

A little bit more info from one user:

My workaround has always been to change the model from large-v3-turbo to large-v3-turbo-q_5, hit continue, and then after it transcribes a few more mp3s, it'll give the same error and I'll change it back and continue in this way until it finishes. I'm using CUDA 12.6

Let me know if it would be helpful to share the offending audio files!

The text was updated successfully, but these errors were encountered:

rotemdan · 2024-12-24T16:42:31Z

For a multilingual model like large-v3-turbo the end of the valid token range token is (should be non-inclusive):

timestampTokensEnd: 50364 + 1501 // exactly 51865

Then the validation is:

	isValidToken(token: number) {
		return token < this.tokenConfig.timestampTokensEnd
	}

Meaning that 51865 is one above the highest timestamp token accepted and is rejected.

I've never seen a token with a value of 51865 before (and so never seen this error). That doesn't look right since it would go beyond the 30.0s range, meaning 1501 * 0.02 = 30.02s.

It's possible that my assumption wasn't correct here. I'm not sure.

In v2.0.14 I changed the code to accept this particular unusual value but clamp the second value to 30s when converted.

	isValidToken(token: number) {
		return token <= this.tokenConfig.timestampTokensEnd
	}

I'd rather still error on larger values, since the error reporting can be helpful to find bugs and other odd edge cases.

smoores-dev · 2024-12-24T16:46:18Z

Amazing, thanks @rotemdan. I'll make a new Storyteller release and confirm that this fixes the issue. Erroring on larger values makes sense to me; I'll report back if we see that popping up.

rotemdan · 2024-12-24T17:00:56Z

The reference Python implementation bounds the special token range like this:

    specials = [
        "<|endoftext|>",
        "<|startoftranscript|>",
        *[f"<|{lang}|>" for lang in list(LANGUAGES.keys())[:num_languages]],
        "<|translate|>",
        "<|transcribe|>",
        "<|startoflm|>",
        "<|startofprev|>",
        "<|nospeech|>",
        "<|notimestamps|>",
        *[f"<|{i * 0.02:.2f}|>" for i in range(1501)],
    ]

In for i in range(1501), 1501 is excluded as a valid timestamp token and doesn't receive a string identifier.

In my implementation, this token is not in the range of valid tokens and should never be output. I verified the ONNX model returns 51865 elements in the logit vector (indexed 0 to 51864) for multilingual and 51864 elements (indexes 0 to 51863) for English only models.

The out-of-range token seems to be received from the output of whisper.cpp, so it's possible it is an internal issue with whisper.cpp. I don't know.

smoores-dev · 2024-12-24T17:04:18Z

Happy to open an issue there (those folks are quite a lot less responsive than you are, unfortunately; I've still never heard back about the other token issue I opened months ago) and see if they have any thoughts!

echogarden-project/echogarden#85 for details! Changelog: fixed

rotemdan · 2024-12-24T17:20:11Z

If I open the ONNX models using Netron (an ONNX model viewer), the output tensor has a fixed length for its last axis, of 51865 for tiny multilingual model:

And 51864 for tiny.en:

But however, I just realized that in the large-v3-turbo ONNX it does have a length of 51866!

Very surprising.

I'm not sure what the last token means. Since the turbo model was added to the reference Python implementation pretty silently, this fact isn't may not be well documented. I'll need to go into the Python code to find any explanation for this.

smoores-dev · 2024-12-24T17:22:19Z

Ah, woah! That explains why folks are only seeing this on large-v3-turbo, I guess

rotemdan · 2024-12-24T17:43:41Z

I looked also at the ONNX graph for whisper-large-v3 and it also has the extra token in the logit tensor.

I haven't seen any mention in the Python source code to what that means. Web searches didn't bring anything. This discussion just confirmed the what I said before:

Even 51865 is a valid token for the large models, I don't know why and when it should be output. It's a mystery for now, I guess.

smoores-dev · 2024-12-24T19:56:43Z

🤷 well, the good news is that your patch worked, and Storyteller is working with the turbo model now!

smoores-dev added a commit to smoores-dev/storyteller that referenced this issue Dec 24, 2024

Update echogarden to fix Invalid token: 51865 error

0f221eb

echogarden-project/echogarden#85 for details! Changelog: fixed

smoores-dev closed this as completed Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: Invalid token: 51865 #85

ERROR: Invalid token: 51865 #85

smoores-dev commented Dec 24, 2024 •

edited

Loading

rotemdan commented Dec 24, 2024 •

edited

Loading

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024 •

edited

Loading

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024 •

edited

Loading

smoores-dev commented Dec 24, 2024

ERROR: Invalid token: 51865 #85

ERROR: Invalid token: 51865 #85

Comments

smoores-dev commented Dec 24, 2024 • edited Loading

rotemdan commented Dec 24, 2024 • edited Loading

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024 • edited Loading

smoores-dev commented Dec 24, 2024

rotemdan commented Dec 24, 2024 • edited Loading

smoores-dev commented Dec 24, 2024

smoores-dev commented Dec 24, 2024 •

edited

Loading

rotemdan commented Dec 24, 2024 •

edited

Loading

rotemdan commented Dec 24, 2024 •

edited

Loading

rotemdan commented Dec 24, 2024 •

edited

Loading