Add a CLI option in main.cpp in order to stop generation at newline token #6441

WilliamTambellini · 2024-04-02T18:39:51Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X ] I carefully followed the README.md.
[X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Just add a CLI option in main.cpp in order to stop generation at newline token

Motivation

For some users like us, the goal is not to chat/have a discussion with a LLM, just to get a single line of output.

Possible Implementation

in main.cpp: eg:

        if (params.stop_at_nl && output_tokens.size() > 0 && !embd.empty() && embd.back() == llama_token_nl(model)) {
            break; // only generate 1 single output line
        }

The text was updated successfully, but these errors were encountered:

Jeximo · 2024-04-03T00:57:20Z

-e -r "\n" on CLI should stop generation at a newline.

-e, --escape ... process prompt escapes sequences (\n, \r, \t, \', \", \\)

llama.cpp/examples/main/README.md

Line 104 in f87f7b8

    
           -   `-r PROMPT, --reverse-prompt PROMPT`: Specify one or multiple reverse prompts to pause text generation and switch to interactive mode. For example, `-r "User:"` can be used to jump back into the conversation whenever it's the user's turn to speak. This helps create a more interactive and conversational experience. However, the reverse prompt doesn't work when it ends with a space.

HanClinto · 2024-04-03T14:15:28Z

I think that @Jeximo has the best solution, but an alternate approach would be to utilize grammars.

--grammar '[^\n]+'

This should force the LLM to generate anything except for newline characters, keeping everything on a single line.

WilliamTambellini · 2024-04-03T16:55:47Z

Tks @Jeximo but sadly that does not do exactly what I need: a single line non-empty output:
-e -r "\n" does stop at the newline char but does not enforce the output to be non empty. That s why I added "output_tokens.size() > 0" in my implementation.
Now, perhaps "-e -r \n" could do the job if there is a way to force main to generate a non empty output. Is there ?

Tks @HanClinto but "--grammar '[^\n]+'" is not the desired feature: I still want to authorize the llm to 'softmax' a newline char but main to stop generating when the LLM advices a newline, not to continue.

HanClinto · 2024-04-03T17:57:45Z

Now, perhaps "-e -r \n" could do the job if there is a way to force main to generate a non empty output. Is there ?

For this, yes -- you can use grammars to require at least one non-newline character before generating a newline.

This worked for me:

./main -m ./models/llama-2-7b.Q4_0.gguf -e -r "\n" --grammar "root ::= [^\n][^\x00]*" -p "My favorite flavor is "

This grammar requires at least one non-newline character, followed one or more of any other character. The reason why I used [^\x00] instead of . is that it seems like this symbol might not be implemented in the grammar parser -- I intend to look into this -- it might simply be my own issue. The intended grammar (if I could get "." to work) would be:
root ::= [^\n].*

Hopefully that helps you visualize what the chosen grammar is doing.

I realize this isn't as clean as your suggested solution -- it's always a tough balance of too much clutter in the command-line options, vs. making it easier to do common tasks. I think if enough other people found this to be a common use-case, then the project might want to support it as a first-class command-line option.

github-actions · 2024-05-18T01:58:14Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

WilliamTambellini added the enhancement New feature or request label Apr 2, 2024

HanClinto mentioned this issue Apr 4, 2024

Added support for . (any character) token in grammar engine. #6467

Merged

github-actions bot added the stale label May 4, 2024

github-actions bot closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a CLI option in main.cpp in order to stop generation at newline token #6441

Add a CLI option in main.cpp in order to stop generation at newline token #6441

WilliamTambellini commented Apr 2, 2024

Jeximo commented Apr 3, 2024

HanClinto commented Apr 3, 2024

WilliamTambellini commented Apr 3, 2024

HanClinto commented Apr 3, 2024 •

edited

Loading

github-actions bot commented May 18, 2024

Add a CLI option in main.cpp in order to stop generation at newline token #6441

Add a CLI option in main.cpp in order to stop generation at newline token #6441

Comments

WilliamTambellini commented Apr 2, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

Jeximo commented Apr 3, 2024

HanClinto commented Apr 3, 2024

WilliamTambellini commented Apr 3, 2024

HanClinto commented Apr 3, 2024 • edited Loading

github-actions bot commented May 18, 2024

HanClinto commented Apr 3, 2024 •

edited

Loading