We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I noticed test-tokenizer-0-falcon seems to be deactivated. Any reasons for this?
test-tokenizer-0-falcon
The text was updated successfully, but these errors were encountered:
It's failing - no unicode support. So decided to disable it until we add it
Sorry, something went wrong.
I'm trying to get something like test-tokenizer-1 for falcon working. Yeah, Unicode seems to be part of the problem.
@ all concerned : if we want to do proper tokenization, please don't modify the original vocabulary and add
GGML_ASSERT(vocab.id_to_token.size() == vocab.token_to_id.size());
to your vocab-loading code, otherwise I don't see a chance.
vocab
Successfully merging a pull request may close this issue.
I noticed
test-tokenizer-0-falcon
seems to be deactivated. Any reasons for this?The text was updated successfully, but these errors were encountered: