Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Due to errors caused by adding logprobs #1328

Closed
4 tasks done
devcxl opened this issue Apr 4, 2024 · 3 comments
Closed
4 tasks done

Bug: Due to errors caused by adding logprobs #1328

devcxl opened this issue Apr 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@devcxl
Copy link

devcxl commented Apr 4, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I hope to fix this bug as soon as possible.

Current Behavior

In #1311, the logprobs field was not handled correctly.

Environment and Context

  • Python 3.11.8
  • llama-cpp-python[all]==v0.2.59
  • GNU Make 4.4.1
  • g++ (GCC) 13.2.1 20230801

Failure Information (for bugs)

In #1311, the logprobs field was introduced. Due to the lack of logprobs returned when using function calls, it resulted in the malfunction of function calls.

Steps to Reproduce

  1. Install llama-cpp-python[all]==v0.2.57
  2. edit config.json and run server python3 -m llama_cpp.server --config_file config.json
        {
            "model": "models/Qwen1.5/qwen1_5-4b-chat-q4_k_m.gguf",
            "model_alias": "qwen1_5-4b-chat-q4_k_m",
            "chat_format": "chatml-function-calling",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 12,
            "n_batch": 512,
            "n_ctx": 2048
        },
  3. Run the first function call example in this notebook.
  4. The execution is normal.
  5. Install llama-cpp-python[all]==v0.2.59
  6. Run the first function call example in this notebook.
  7. Internal Server Error
Exception: 2 validation errors:
{'type': 'missing', 'loc': ('response', 'typed-dict', 'choices', 0, 'logprobs'), 'msg': 'Field required', 'input': {'finish_reason': 'tool_calls', 'index': 0, 'message': {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'call__0_get_current_weather_cmpl-58520529-a626-4a1e-8b4b-1fca9dd2d68a', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}], 'function_call': {'name': 'get_current_weather:', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}}, 'url': 'https://errors.pydantic.dev/2.6/v/missing'}
{'type': 'string_type', 'loc': ('response', 'str'), 'msg': 'Input should be a valid string', 'input': {'id': 'chatcmpl-5ce5ae67-c028-427f-a8bb-fe3ff94eb934', 'object': 'chat.completion', 'created': 1712263433, 'model': 'qwen1_5-4b-chat-q4_k_m', 'choices': [{'finish_reason': 'tool_calls', 'index': 0, 'message': {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'call__0_get_current_weather_cmpl-58520529-a626-4a1e-8b4b-1fca9dd2d68a', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}], 'function_call': {'name': 'get_current_weather:', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}}], 'usage': {'completion_tokens': 22, 'prompt_tokens': 31, 'total_tokens': 53}}, 'url': 'https://errors.pydantic.dev/2.6/v/string_type'}

Traceback (most recent call last):
  File "/home/devcxl/download/llama-server/.evm/lib/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcxl/download/llama-server/.evm/lib/python3.11/site-packages/fastapi/routing.py", line 296, in app
    content = await serialize_response(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcxl/download/llama-server/.evm/lib/python3.11/site-packages/fastapi/routing.py", line 155, in serialize_response
    raise ResponseValidationError(
fastapi.exceptions.ResponseValidationError: 2 validation errors:
  {'type': 'missing', 'loc': ('response', 'typed-dict', 'choices', 0, 'logprobs'), 'msg': 'Field required', 'input': {'finish_reason': 'tool_calls', 'index': 0, 'message': {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'call__0_get_current_weather_cmpl-58520529-a626-4a1e-8b4b-1fca9dd2d68a', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}], 'function_call': {'name': 'get_current_weather:', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}}, 'url': 'https://errors.pydantic.dev/2.6/v/missing'}
  {'type': 'string_type', 'loc': ('response', 'str'), 'msg': 'Input should be a valid string', 'input': {'id': 'chatcmpl-5ce5ae67-c028-427f-a8bb-fe3ff94eb934', 'object': 'chat.completion', 'created': 1712263433, 'model': 'qwen1_5-4b-chat-q4_k_m', 'choices': [{'finish_reason': 'tool_calls', 'index': 0, 'message': {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'call__0_get_current_weather_cmpl-58520529-a626-4a1e-8b4b-1fca9dd2d68a', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}], 'function_call': {'name': 'get_current_weather:', 'arguments': '{ "location": "San Francisco, Tokyo, Paris" , "unit": "fahrenheit"}'}}}], 'usage': {'completion_tokens': 22, 'prompt_tokens': 31, 'total_tokens': 53}}, 'url': 'https://errors.pydantic.dev/2.6/v/string_type'}

INFO:     127.0.0.1:36362 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

@devcxl
Copy link
Author

devcxl commented Apr 4, 2024

Additionally, I believe it's advisable to utilize GitHub Actions for automated testing before merging branches to ensure the stability of the project.

@abetlen abetlen added the bug Something isn't working label Apr 5, 2024
@abetlen
Copy link
Owner

abetlen commented Apr 5, 2024

@devcxl thanks for reporting. You're right and this really could've been caught by static analysis as well through type checking. For what it's worth the PR is actually more correct with respect to the current OpenAI API spec, I'll resolve the type errors in the chat format and fix this!

@devcxl
Copy link
Author

devcxl commented Apr 5, 2024

OK, thanks.

@abetlen abetlen closed this as completed in 49bc66b Apr 5, 2024
abetlen added a commit that referenced this issue Apr 5, 2024
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this issue Apr 6, 2024
xhedit added a commit to xhedit/llama-cpp-conv that referenced this issue Apr 6, 2024
* feat: add support for KV cache quantization options (abetlen#1307)

* add KV cache quantization options

abetlen#1220
abetlen#1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <[email protected]>

* fix: Changed local API doc references to hosted (abetlen#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (abetlen#1295)

Co-authored-by: ymikhaylov <[email protected]>
Co-authored-by: Andrei <[email protected]>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes abetlen#1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (abetlen#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 abetlen#1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 Closes abetlen#1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes abetlen#1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <[email protected]>
Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: lawfordp2017 <[email protected]>
Co-authored-by: Yuri Mikhailov <[email protected]>
Co-authored-by: ymikhaylov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants