Reset token budget after every user intervention. #306

tjohnman · 2023-03-19T22:05:19Z

In interactive mode, every time the model has to respond to user input it has an increasingly reduced token budget, eventually generating only a few words before stopping. The token budget in interactive should apply to every batch of tokens after user intervention, not globally.

In interactive mode, every time the model has to respond to user input it has an increasingly reduced token budget, eventually generating only a few words before stopping. The token budget in interactive should apply to every batch of tokens after user intervention, not globally

Green-Sky · 2023-03-19T22:24:05Z

main.cpp

@@ -1054,11 +1054,11 @@ int main(int argc, char ** argv) {
                    embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end());
                }

-                remaining_tokens -= line_inp.size();
+                remaining_tokens = params.n_predict - line_inp.size();


This can get bigger then remaining space in context.
and after https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L850 remaining_tokens, are actually all the space that is left. no?

resetting remaining_tokens to params.n_predict would only make sense when we reset the memory, which we don't right now. see #71

I see. Yes, going over the context size can be a problem. But remaining_tokens usually is smaller than the size of the context (because params.n_predict is), so it should still be reset after every interaction with the user so that the series of tokens can be as long as the first one or as long as the remaining context space allows. It should be clamped to never go over the remaining space in the context.

std::min(params.n_predict, model.hparams.n_ctx - (int) embd_inp.size()) is exactly the formula that should be used instead of the simple assignment I did to reset it so that it doesn't overflow the context.

And in fact, it should be used also when resetting it due to running out of tokens, too. Good catch.

tjohnman · 2023-03-24T15:25:44Z

I'm closing this pull request because it doesn't make sense as things are right now with the lack of context shifting and when that's figured out this will have to be implemented differently.

Green-Sky reviewed Mar 19, 2023

View reviewed changes

Don't let remaining_tokens get larger than the context.

fbff268

gjmulder added the enhancement New feature or request label Mar 20, 2023

tjohnman closed this Mar 24, 2023

tjohnman deleted the reset-token-budget branch March 24, 2023 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset token budget after every user intervention. #306

Reset token budget after every user intervention. #306

tjohnman commented Mar 19, 2023

Green-Sky Mar 19, 2023

Green-Sky Mar 19, 2023 •

edited

Loading

tjohnman Mar 19, 2023 •

edited

Loading

tjohnman commented Mar 24, 2023

Reset token budget after every user intervention. #306

Reset token budget after every user intervention. #306

Conversation

tjohnman commented Mar 19, 2023

Green-Sky Mar 19, 2023

Choose a reason for hiding this comment

Green-Sky Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

tjohnman Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

tjohnman commented Mar 24, 2023

Green-Sky Mar 19, 2023 •

edited

Loading

tjohnman Mar 19, 2023 •

edited

Loading