Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset token budget after every user intervention. #306

Closed
wants to merge 2 commits into from

Conversation

tjohnman
Copy link
Contributor

In interactive mode, every time the model has to respond to user input it has an increasingly reduced token budget, eventually generating only a few words before stopping. The token budget in interactive should apply to every batch of tokens after user intervention, not globally.

In interactive mode, every time the model has to respond to user input
it has an increasingly reduced token budget, eventually generating only
a few words before stopping. The token budget in interactive should
apply to every batch of tokens after user intervention, not globally
main.cpp Outdated
@@ -1054,11 +1054,11 @@ int main(int argc, char ** argv) {
embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end());
}

remaining_tokens -= line_inp.size();
remaining_tokens = params.n_predict - line_inp.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can get bigger then remaining space in context.
and after https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L850 remaining_tokens, are actually all the space that is left. no?

Copy link
Collaborator

@Green-Sky Green-Sky Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetting remaining_tokens to params.n_predict would only make sense when we reset the memory, which we don't right now. see #71

Copy link
Contributor Author

@tjohnman tjohnman Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, going over the context size can be a problem. But remaining_tokens usually is smaller than the size of the context (because params.n_predict is), so it should still be reset after every interaction with the user so that the series of tokens can be as long as the first one or as long as the remaining context space allows. It should be clamped to never go over the remaining space in the context.

std::min(params.n_predict, model.hparams.n_ctx - (int) embd_inp.size()) is exactly the formula that should be used instead of the simple assignment I did to reset it so that it doesn't overflow the context.

And in fact, it should be used also when resetting it due to running out of tokens, too. Good catch.

@gjmulder gjmulder added the enhancement New feature or request label Mar 20, 2023
@tjohnman
Copy link
Contributor Author

I'm closing this pull request because it doesn't make sense as things are right now with the lack of context shifting and when that's figured out this will have to be implemented differently.

@tjohnman tjohnman closed this Mar 24, 2023
@tjohnman tjohnman deleted the reset-token-budget branch March 24, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants