Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch OpenAI parsing errors earlier #8083

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

cynic64
Copy link

@cynic64 cynic64 commented Jan 29, 2025

Title

Catch OpenAI parsing errors earlier

Relevant issues

Helps with #7730

Type

🐛 Bug Fix (sort of)

Changes

First a quick explanation of the problem. ChatCompletion.parse() from the openai module is designed to return the raw response as a string if it cannot be parsed:

openai/_legacy_response.py:
---
    def _parse(self, *, to: type[_T] | None = None) -> R | _T:
    ...
            # If the API responds with content that isn't JSON then we just return
            # the (decoded) text without performing any parsing so that you can still
            # handle the response however you need to.
            return response.text  # type: ignore

However, in litellm, some functions assume that .parse() always returns BaseModel, for example here in litellm/llms/openai/openai.py:

    async def make_openai_chat_completion_request(
        self,
        openai_aclient: AsyncOpenAI,
        data: dict,
        timeout: Union[float, httpx.Timeout],
        logging_obj: LiteLLMLoggingObj,
    ) -> Tuple[dict, BaseModel]:
    ...
            response = raw_response.parse()
            return headers, response

As you can see from the type signature, the function is supposed to always return BaseModel, but it doesn't check whether parse() really parsed anything.

Later on, this causes tricky errors like the one in #7730, where the problem is a bad response being received but the error message is unrelated. To demonstrate this, here's a non-functioning "OpenAI API server":

from flask import Flask, request, Response

app = Flask(__name__)

@app.route("/chat/completions", methods=["POST"])
def respond():
    return "here's some invalid data"

Here's a sample client:

import litellm

print("Testing streaming...")
response = litellm.completion(
    model="openai/my-crappy-server",
    api_key="don't need it",
    api_base="http://127.0.0.1:5000",
    messages=[
            {
                "role": "user",
                "content": "Hey, how's it going?",
            }
    ],
    stream=True,
)

for part in response:
    print(part.choices[0].delta.content)

print("Testing one-shot...")
response = litellm.completion(
    model="openai/my-crappy-server",
    api_key="don't need it",
    api_base="http://127.0.0.1:5000",
    messages=[
            {
                "role": "user",
                "content": "Hey, how's it going?",
            }
    ],
)

print(response)

Currently, the streaming appears to be succesful, but doesn't return anything, and the one-shot version fails with a cryptic error message. Here's what it looks like with the broken server running on localhost:

Testing streaming...
None
Testing one-shot...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

Traceback (most recent call last):
  File "/tmp/litellm-base/litellm/llms/openai/openai.py", line 707, in completion
    raise e
  File "/tmp/litellm-base/litellm/llms/openai/openai.py", line 643, in completion
    stringified_response = response.model_dump()
                           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'model_dump'

... two copies of the same message ...

To improve this, I just added a type check in functions returning BaseModel that use raw_response.parse():

            if type(response) != BaseModel:
                raise OpenAIError(
                    status_code=422,
                    message = "Could not parse response",
                )

With my changes, both tests produce this clearer error message:

Traceback (most recent call last):
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 452, in make_sync_openai_chat_completion_request
    raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: Could not parse response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 720, in completion
    raise e
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 604, in completion
    return self.streaming(
           ^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 873, in streaming
    headers, response = self.make_sync_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/litellm_core_utils/logging_utils.py", line 145, in sync_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 460, in make_sync_openai_chat_completion_request
    raise Exception(
Exception: error - Could not parse response, Received response - <APIResponse [200 OK] type=<class 'openai.types.chat.chat_completion.ChatCompletion'>>, Type of response - <class 'openai._legacy_response.LegacyAPIResponse'>

Let me know if the message should be changed or something, I'm not familiar with litellm's codebase.

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

Here's how many tests pass on my machine in the base repo:

!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 62 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!
============================== 108 warnings, 62 errors in 24.95s ==============================

And here's how many pass with my changes:

!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 62 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!
============================== 108 warnings, 62 errors in 21.81s ==============================

So no tests were broken by the change. The failing tests seem to be due to missing API keys and such.

Copy link

vercel bot commented Jan 29, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 29, 2025 3:38pm

@cynic64
Copy link
Author

cynic64 commented Jan 29, 2025

It turns out the functions like make_openai_chat_completion_request usually return BaseModel, but will return openai.Stream if streaming mode is on. So my previous check was too restrictive. The 2nd commit has this updated check:

            if isinstance(response, str):
                raise OpenAIError(
                    status_code=422,
                    message="Could not parse response",
                )

Which makes streaming mode work again. Sorry about that...

Perhaps this signature should be updated, since both BaseModel and openai.Stream can be returned?

    async def make_openai_chat_completion_request(
        self,
        openai_aclient: AsyncOpenAI,
        data: dict,
        timeout: Union[float, httpx.Timeout],
        logging_obj: LiteLLMLoggingObj,
    ) -> Tuple[dict, BaseModel]:

@krrishdholakia krrishdholakia self-assigned this Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants