[BUG] OpenAI server stalled after processing an embedding request while serving a chat model #7502

QwertyJack · 2024-08-14T04:51:58Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

🐛 Describe the bug

The OpenAI server encounters an error when processing embedding requests from the chat model:

INFO:     127.0.0.1:46414 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 132, in create_embedding
    generator = await openai_serving_embedding.create_embedding(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_embedding.py", line 124, in create_embedding
    async for i, res in result_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 250, in consumer
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 241, in consumer
    raise item
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 225, in producer
    async for item in iterator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 747, in encode
    async for output in self._process_request(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 780, in _process_request
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 776, in _process_request
    async for request_output in stream:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 89, in __anext__
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
    output = await self.model_executor.execute_model_async(
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 280, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 735, in execute_model
    ) = self.prepare_input_tensors(seq_group_metadata_list)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors
    sampling_metadata = SamplingMetadata.prepare(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare
    ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups
    if sampling_params.seed is not None:
AttributeError: 'NoneType' object has no attribute 'seed'

The server crashes and must be restarted then.

The text was updated successfully, but these errors were encountered:

QwertyJack · 2024-08-14T06:48:30Z

Server now returns 400 Bad Request after PR #7504 instead of hanging.

QwertyJack added the bug Something isn't working label Aug 14, 2024

QwertyJack mentioned this issue Aug 14, 2024

[Bugfix][Frontend] Disable embedding API for chat models #7504

Merged

QwertyJack closed this as completed Aug 14, 2024

QwertyJack mentioned this issue Aug 14, 2024

embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16 #6015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OpenAI server stalled after processing an embedding request while serving a chat model #7502

[BUG] OpenAI server stalled after processing an embedding request while serving a chat model #7502

QwertyJack commented Aug 14, 2024

QwertyJack commented Aug 14, 2024

[BUG] OpenAI server stalled after processing an embedding request while serving a chat model #7502

[BUG] OpenAI server stalled after processing an embedding request while serving a chat model #7502

Comments

QwertyJack commented Aug 14, 2024

Your current environment

🐛 Describe the bug

QwertyJack commented Aug 14, 2024