[Bug] Server completions return a lot of colons #3575

dkogut1996 · 2023-10-11T00:25:35Z

Ever since #3228, completion requests to the server example occasionally return a good deal of consecutive colons before a readable response, and sometimes it's almost exclusively colons, for example:
{"content": "::::::::::::::::::::::::: Hello, I'm an AI created by ChatBot. How can I assist you today?"}
{"content": "::::::::::::::::?"}

I've tested on a range of models (Mythomax 13B, Mythomax Kimiko 13B, Luna 7B, MlewdBoros 13B, Synthia 7B) and get the same results. I can reproduce it by sending this body to the server continually:

{"n_predict":256,"prompt":"Text transcript of a never-ending conversation between User and Assistant.\n\n#User: hi there\n#Assistant:", "stop":["\n#","\nUser:","\nuser:","\n["]}

It does not happen on every response (about 1 in 5-10 responses experience this) but enough to be distracting and make me wonder if I'm doing something wrong. I know the repeat_penalty and logit_bias fields should help here, but they both seem to have no effect on the problem from my testing and also were not previously explicitly needed before the aforementioned PR.

I'm running on an M1 Max chip and writing this as of commit 9f6ede1.

Does anyone have any insights into how I could fix this or if this is perhaps a bug in the server example?

The text was updated successfully, but these errors were encountered:

spencekim · 2023-10-11T00:39:36Z

I am seeing this as well on GGUF models. Running M2 Max.

ggerganov · 2023-10-11T06:51:10Z

Likely some KV cache mis-management - need to take a look. Let us know more info if you find a additional repro steps

ggerganov · 2023-10-11T21:02:33Z

Can you guys give #3588 and see if it fixes the issue?
Also, would be useful to post the commands that you are using to repro this

dkogut1996 · 2023-10-11T21:40:00Z

That fixes it!

I compiled the bin with make server from root and ran ./server -m <my_model> and tested via Postman with the equivalent of this curl:

curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{"n_predict":256,"prompt":"Text transcript of a never-ending conversation between User and Assistant.\n\n#User: hi there\n#Assistant:", "stop":["\n#","\nUser:","\nuser:","\n["]}'

It's not the easiest to reproduce on a consistent basis; sometimes it takes running the above a few dozen times to reproduce the colons.

Thanks so much for the quick response and fix! This project is great and you guys do a wonderful job maintaining and updating it!

ggerganov added the bug Something isn't working label Oct 11, 2023

ggerganov mentioned this issue Oct 11, 2023

server : fix kv cache management #3588

Merged

dkogut1996 closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Server completions return a lot of colons #3575

[Bug] Server completions return a lot of colons #3575

dkogut1996 commented Oct 11, 2023 •

edited

Loading

spencekim commented Oct 11, 2023

ggerganov commented Oct 11, 2023

ggerganov commented Oct 11, 2023

dkogut1996 commented Oct 11, 2023

[Bug] Server completions return a lot of colons #3575

[Bug] Server completions return a lot of colons #3575

Comments

dkogut1996 commented Oct 11, 2023 • edited Loading

spencekim commented Oct 11, 2023

ggerganov commented Oct 11, 2023

ggerganov commented Oct 11, 2023

dkogut1996 commented Oct 11, 2023

dkogut1996 commented Oct 11, 2023 •

edited

Loading