Catch OpenAI parsing errors earlier #8083

cynic64 · 2025-01-29T15:02:43Z

Title

Catch OpenAI parsing errors earlier

Relevant issues

Helps with #7730

Type

🐛 Bug Fix (sort of)

Changes

First a quick explanation of the problem. ChatCompletion.parse() from the openai module is designed to return the raw response as a string if it cannot be parsed:

openai/_legacy_response.py:
---
    def _parse(self, *, to: type[_T] | None = None) -> R | _T:
    ...
            # If the API responds with content that isn't JSON then we just return
            # the (decoded) text without performing any parsing so that you can still
            # handle the response however you need to.
            return response.text  # type: ignore

However, in litellm, some functions assume that .parse() always returns BaseModel, for example here in litellm/llms/openai/openai.py:

    async def make_openai_chat_completion_request(
        self,
        openai_aclient: AsyncOpenAI,
        data: dict,
        timeout: Union[float, httpx.Timeout],
        logging_obj: LiteLLMLoggingObj,
    ) -> Tuple[dict, BaseModel]:
    ...
            response = raw_response.parse()
            return headers, response

As you can see from the type signature, the function is supposed to always return BaseModel, but it doesn't check whether parse() really parsed anything.

Later on, this causes tricky errors like the one in #7730, where the problem is a bad response being received but the error message is unrelated. To demonstrate this, here's a non-functioning "OpenAI API server":

from flask import Flask, request, Response

app = Flask(__name__)

@app.route("/chat/completions", methods=["POST"])
def respond():
    return "here's some invalid data"

Here's a sample client:

import litellm

print("Testing streaming...")
response = litellm.completion(
    model="openai/my-crappy-server",
    api_key="don't need it",
    api_base="http://127.0.0.1:5000",
    messages=[
            {
                "role": "user",
                "content": "Hey, how's it going?",
            }
    ],
    stream=True,
)

for part in response:
    print(part.choices[0].delta.content)

print("Testing one-shot...")
response = litellm.completion(
    model="openai/my-crappy-server",
    api_key="don't need it",
    api_base="http://127.0.0.1:5000",
    messages=[
            {
                "role": "user",
                "content": "Hey, how's it going?",
            }
    ],
)

print(response)

Currently, the streaming appears to be succesful, but doesn't return anything, and the one-shot version fails with a cryptic error message. Here's what it looks like with the broken server running on localhost:

Testing streaming...
None
Testing one-shot...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

Traceback (most recent call last):
  File "/tmp/litellm-base/litellm/llms/openai/openai.py", line 707, in completion
    raise e
  File "/tmp/litellm-base/litellm/llms/openai/openai.py", line 643, in completion
    stringified_response = response.model_dump()
                           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'model_dump'

... two copies of the same message ...

To improve this, I just added a type check in functions returning BaseModel that use raw_response.parse():

            if type(response) != BaseModel:
                raise OpenAIError(
                    status_code=422,
                    message = "Could not parse response",
                )

With my changes, both tests produce this clearer error message:

Traceback (most recent call last):
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 452, in make_sync_openai_chat_completion_request
    raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: Could not parse response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 720, in completion
    raise e
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 604, in completion
    return self.streaming(
           ^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 873, in streaming
    headers, response = self.make_sync_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/litellm_core_utils/logging_utils.py", line 145, in sync_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/litellm/litellm/llms/openai/openai.py", line 460, in make_sync_openai_chat_completion_request
    raise Exception(
Exception: error - Could not parse response, Received response - <APIResponse [200 OK] type=<class 'openai.types.chat.chat_completion.ChatCompletion'>>, Type of response - <class 'openai._legacy_response.LegacyAPIResponse'>

Let me know if the message should be changed or something, I'm not familiar with litellm's codebase.

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

Here's how many tests pass on my machine in the base repo:

!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 62 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!
============================== 108 warnings, 62 errors in 24.95s ==============================

And here's how many pass with my changes:

!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 62 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!
============================== 108 warnings, 62 errors in 21.81s ==============================

So no tests were broken by the change. The failing tests seem to be due to missing API keys and such.

vercel · 2025-01-29T15:02:48Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 29, 2025 3:38pm

cynic64 · 2025-01-29T15:43:09Z

It turns out the functions like make_openai_chat_completion_request usually return BaseModel, but will return openai.Stream if streaming mode is on. So my previous check was too restrictive. The 2nd commit has this updated check:

            if isinstance(response, str):
                raise OpenAIError(
                    status_code=422,
                    message="Could not parse response",
                )

Which makes streaming mode work again. Sorry about that...

Perhaps this signature should be updated, since both BaseModel and openai.Stream can be returned?

    async def make_openai_chat_completion_request(
        self,
        openai_aclient: AsyncOpenAI,
        data: dict,
        timeout: Union[float, httpx.Timeout],
        logging_obj: LiteLLMLoggingObj,
    ) -> Tuple[dict, BaseModel]:

Catch OpenAI parsing errors earlier

091b557

vercel bot deployed to Preview January 29, 2025 15:03 View deployment

Make OpenAI streaming mode work alongside earlier error detection

260875e

vercel bot deployed to Preview January 29, 2025 15:38 View deployment

krrishdholakia self-assigned this Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch OpenAI parsing errors earlier #8083

Catch OpenAI parsing errors earlier #8083

cynic64 commented Jan 29, 2025 •

edited

Loading

vercel bot commented Jan 29, 2025 •

edited

Loading

cynic64 commented Jan 29, 2025

Catch OpenAI parsing errors earlier #8083

Are you sure you want to change the base?

Catch OpenAI parsing errors earlier #8083

Conversation

cynic64 commented Jan 29, 2025 • edited Loading

Title

Relevant issues

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

vercel bot commented Jan 29, 2025 • edited Loading

cynic64 commented Jan 29, 2025

cynic64 commented Jan 29, 2025 •

edited

Loading

vercel bot commented Jan 29, 2025 •

edited

Loading