Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

longchat-13b-16k chat not work #14

Open
ahkimkoo opened this issue Jul 5, 2023 · 9 comments
Open

longchat-13b-16k chat not work #14

ahkimkoo opened this issue Jul 5, 2023 · 9 comments

Comments

@ahkimkoo
Copy link

ahkimkoo commented Jul 5, 2023

reply like this:

xxxxxxxxxx

@DachengLi1
Copy link
Owner

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

@ahkimkoo
Copy link
Author

ahkimkoo commented Jul 6, 2023

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

Thank you for your reply, but even if you use English, it can't reply normally

@DachengLi1
Copy link
Owner

Can you give a screen shot on how you are loading the model, and what inputs you give?

@musabgultekin
Copy link

@scuty2000
Copy link

scuty2000 commented Jul 13, 2023

@DachengLi1 I would like to follow up on this. I'm having the same issue, running the same model using fastchat openai-server implementation. Getting the same outputs (some times some "A A A A A A A A A A A A" screaming) while running the latest version with the monkey patch applied.

Here are the requests I send to the endpoint and relative output:

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer fattiicazzituoi" \
  -d '{
     "model": "longchat-13b-16k",
     "messages": [{"role": "user", "content": "Say this is a test."}],
     "temperature": 0.3, "max_tokens": 200
   }'

{"id":"chatcmpl-3tF6uZ7GXm54dmLwfGLQ3y","object":"chat.completion","created":1689243774,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"message":{"role":"assistant","content":"A A A A A A A A A A A A A A A A A A A A"},"finish_reason":"stop"}],"usage":{"prompt_tokens":45,"total_tokens":64,"completion_tokens":19}}

But if I use the "completions" (non-chat) endpoint the model works "correctly" (or at least it does not scream at me):

curl http://localhost:8100/v1/completions \ 
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer aaaaaaaaaaaa" \
  -d '{
    "model": "lmsys/longchat-13b-16k",
    "prompt": "Say this is a test.",
    "max_tokens": 20,
    "temperature": 0.5
  }'

{"id":"cmpl-sBiuu78WYegWDnU3WDmFmF","object":"text_completion","created":1689245357,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"text":"\nYou are a test.\n\n\n\n\n\n\n\n\n\n\n\n\n\n","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":7,"total_tokens":26,"completion_tokens":19}}

TL;DR: LongChat-13B-16K goes like this:
db637a5c-68ef-40c2-a53b-df198452bae9

@DachengLi1
Copy link
Owner

@scuty2000 Fun image lol.
@merrymercy do you have an idea on this. Is there a difference in login in completions versus chat completition (e.g. load_8_bit, patching)?

@scuty2000
Copy link

scuty2000 commented Jul 13, 2023

@DachengLi1 I don't know if this can help, but I suspect is related to the int8 quantization. Using the 7B version not quantized works pretty well.

@DachengLi1
Copy link
Owner

@scuty2000 Yes, I also heard it elsewhere.

@scuty2000
Copy link

Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants