-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reusing results from llama_token_to_piece
when sampling grammars
#4213
Conversation
const std::string piece = llama_token_to_piece(ctx, id); | ||
std::string piece; | ||
|
||
if (pieces != nullptr && pieces[id] != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment for this function doesn't clearly state that individual pieces may be nullptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly playing defence as std::string cannot handle nullptrs / empty c strings (from my understanding - I'm a complete c++ novice)
The lack of docs allows swapping the behavior from calling token_to_piece
to assuming a null entry is an empty string, but from my measurements the performance impact was negligible.
If we want to document the current behavior that's fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string can represent empty strings, but not null. If it was a C string you would still have to do something with it eventually if it is null to prevent a segfault.
Since this code is apparently not used yet, it's hard to reason about performance or what the correct inputs should be.
Wouldn't it be better if we pre-compute the pieces when the model is loaded and store it in Lines 1329 to 1332 in 3e73d31
This way the user does not need to do it themselves. |
Closing in favor of implementing #4213 (comment) Done in #4330 |
This is an amendment to #4210 which allows passing in already computed (ideally shared across calls) strings to avoid calling
llama_token_to_piece
inllama_sample_grammar
. Behavoir remains unchanged by passing innullptr
.See #4210 for the "before".
I'm marking as a draft for now to allow #4210 to be merged in and then I'll clean up the git history.
Also I wanted to see if this is worth a breaking API change (if it is - I'll go fix up other spots in the code where the change can be taken advantage of.)