Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16 #6015

Closed
2679326161or opened this issue Jul 1, 2024 · 6 comments · Fixed by #10184
Labels
bug Something isn't working stale

Comments

@2679326161or
Copy link

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

ERROR 07-01 08:12:10 async_llm_engine.py:52] Engine background task failed
ERROR 07-01 08:12:10 async_llm_engine.py:52] Traceback (most recent call last):
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
ERROR 07-01 08:12:10 async_llm_engine.py:52] return_value = task.result()
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
ERROR 07-01 08:12:10 async_llm_engine.py:52] has_requests_in_progress = await asyncio.wait_for(
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 07-01 08:12:10 async_llm_engine.py:52] return fut.result()
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
ERROR 07-01 08:12:10 async_llm_engine.py:52] request_outputs = await self.engine.step_async()
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
ERROR 07-01 08:12:10 async_llm_engine.py:52] output = await self.model_executor.execute_model_async(
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self._driver_execute_model_async(execute_model_req)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self.driver_exec_model(execute_model_req)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 07-01 08:12:10 async_llm_engine.py:52] result = self.fn(*self.args, **self.kwargs)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(*args, **kwargs)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model
ERROR 07-01 08:12:10 async_llm_engine.py:52] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(*args, **kwargs)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model
ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = self.prepare_input_tensors(seq_group_metadata_list)
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors
ERROR 07-01 08:12:10 async_llm_engine.py:52] sampling_metadata = SamplingMetadata.prepare(
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare
ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups
ERROR 07-01 08:12:10 async_llm_engine.py:52] if sampling_params.seed is not None:
ERROR 07-01 08:12:10 async_llm_engine.py:52] AttributeError: 'NoneType' object has no attribute 'seed'
Exception in callback functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>)>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
return_value = task.result()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
request_outputs = await self.engine.step_async()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
output = await self.model_executor.execute_model_async(
File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model
) = self.prepare_input_tensors(seq_group_metadata_list)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors
sampling_metadata = SamplingMetadata.prepare(
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare
) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups
if sampling_params.seed is not None:
AttributeError: 'NoneType' object has no attribute 'seed'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
INFO 07-01 08:12:10 async_llm_engine.py:167] Aborted request cmpl-13a5e1f614ab4afe99ca9ccc99097603-0.
INFO: 192.168.30.254:63180 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 132, in create_embedding
generator = await openai_serving_embedding.create_embedding(
File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_embedding.py", line 124, in create_embedding
async for i, res in result_generator:
File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 250, in consumer
raise e
File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 241, in consumer
raise item
File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 225, in producer
async for item in iterator:
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 747, in encode
async for output in self._process_request(
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 780, in _process_request
raise e
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 776, in _process_request
async for request_output in stream:
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 89, in anext
raise result
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
return_value = task.result()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
request_outputs = await self.engine.step_async()
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
output = await self.model_executor.execute_model_async(
File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model
) = self.prepare_input_tensors(seq_group_metadata_list)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors
sampling_metadata = SamplingMetadata.prepare(
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare
) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups
if sampling_params.seed is not None:
AttributeError: 'NoneType' object has no attribute 'seed'

@2679326161or 2679326161or added the bug Something isn't working label Jul 1, 2024
@youkaichao
Copy link
Member

please provide more details, following the issue template to report your environment, and show how you use vllm.

@Junyi-99
Copy link

Junyi-99 commented Jul 8, 2024

same error here.

I triggered this exception by adding an OpenAI-API-compatible embedding model in Dify.


I was using the generation model Llama-3-8b instead of an embedding model.

the problem solved when I switch to an embedding model.

@Junyi-99
Copy link

Junyi-99 commented Jul 8, 2024

@2679326161or If you want to use an embedding model, try: https://huggingface.co/intfloat/e5-mistral-7b-instruct.

The models, such as Llama-3-8b, Mistral-7B-Instruct-v0.3, are generation models rather than an embedding model

@LJLQ
Copy link

LJLQ commented Jul 9, 2024

我也遇到了同样的问题,当请求embedding的时候服务就报错了,后面即使是chat请求也无法正常返回

@QwertyJack
Copy link
Contributor

Related to #7502 and fixed by #7504.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
5 participants