Allow reusing results from `llama_token_to_piece` when sampling grammars #4213

MarcusDunn · 2023-11-25T04:59:32Z

This is an amendment to #4210 which allows passing in already computed (ideally shared across calls) strings to avoid calling llama_token_to_piece in llama_sample_grammar. Behavoir remains unchanged by passing in nullptr.

See #4210 for the "before".

I'm marking as a draft for now to allow #4210 to be merged in and then I'll clean up the git history.

Also I wanted to see if this is worth a breaking API change (if it is - I'll go fix up other spots in the code where the change can be taken advantage of.)

cebtenzzre · 2023-11-25T16:11:42Z

llama.cpp

-        const std::string piece = llama_token_to_piece(ctx, id);
+        std::string piece;
+
+        if (pieces != nullptr && pieces[id] != nullptr) {


The doc comment for this function doesn't clearly state that individual pieces may be nullptr.

This is mostly playing defence as std::string cannot handle nullptrs / empty c strings (from my understanding - I'm a complete c++ novice)

The lack of docs allows swapping the behavior from calling token_to_piece to assuming a null entry is an empty string, but from my measurements the performance impact was negligible.

If we want to document the current behavior that's fine by me.

std::string can represent empty strings, but not null. If it was a C string you would still have to do something with it eventually if it is null to prevent a segfault.

Since this code is apparently not used yet, it's hard to reason about performance or what the correct inputs should be.

ggerganov · 2023-11-27T17:51:34Z

Also I wanted to see if this is worth a breaking API change (if it is - I'll go fix up other spots in the code where the change can be taken advantage of.)

Wouldn't it be better if we pre-compute the pieces when the model is loaded and store it in llama_vocab?

llama.cpp/llama.cpp

Lines 1329 to 1332 in 3e73d31

    
           std::unordered_map<token, id> token_to_id; 
        
           std::vector<token_data>       id_to_token;

This way the user does not need to do it themselves.

MarcusDunn · 2023-12-04T19:50:22Z

Closing in favor of implementing #4213 (comment)

Done in #4330

MarcusDunn and others added 6 commits November 24, 2023 16:30

reserve space for codepoints

2e5c8ae

Merge branch 'ggerganov:master' into master

a4b7b4c

improvement for the appended 0

9d3ba0b

changed allowed saving of pieces to reduce calls to llama_token_to_piece

f29add5

added docs

c788d1b

terminology.

3cc9682

cebtenzzre reviewed Nov 25, 2023

View reviewed changes

ggerganov mentioned this pull request Nov 25, 2023

llama : speed-up grammar sampling #4218

Open

MarcusDunn closed this Dec 4, 2023

MarcusDunn deleted the grammar_cache_tokens branch December 4, 2023 19:50

MarcusDunn mentioned this pull request Dec 4, 2023

sample grammar perf #4330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reusing results from `llama_token_to_piece` when sampling grammars #4213

Allow reusing results from `llama_token_to_piece` when sampling grammars #4213

MarcusDunn commented Nov 25, 2023

cebtenzzre Nov 25, 2023

MarcusDunn Nov 25, 2023

cebtenzzre Nov 25, 2023 •

edited

Loading

ggerganov commented Nov 27, 2023

MarcusDunn commented Dec 4, 2023 •

edited

Loading

Allow reusing results from llama_token_to_piece when sampling grammars #4213

Allow reusing results from llama_token_to_piece when sampling grammars #4213

Conversation

MarcusDunn commented Nov 25, 2023

cebtenzzre Nov 25, 2023

Choose a reason for hiding this comment

MarcusDunn Nov 25, 2023

Choose a reason for hiding this comment

cebtenzzre Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Nov 27, 2023

MarcusDunn commented Dec 4, 2023 • edited Loading

Allow reusing results from `llama_token_to_piece` when sampling grammars #4213

Allow reusing results from `llama_token_to_piece` when sampling grammars #4213

cebtenzzre Nov 25, 2023 •

edited

Loading

MarcusDunn commented Dec 4, 2023 •

edited

Loading