-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Distributed inference fails on certain multimodal models #8983
Comments
Can you provide the full error logs? So that I can figure out which part is going wrong. |
@Isotr0py here you go. RuntimeError Traceback (most recent call last) File ~/vllm_venv/lib/python3.12/site-packages/vllm/worker/model_runner.py:1590, in ModelRunner.execute_model(self, model_input, kv_caches, intermediate_tensors, num_steps) File ~/vllm_venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/model_executor/models/fuyu.py:285, in FuyuForCausalLM.forward(self, input_ids, positions, kv_caches, attn_metadata, intermediate_tensors, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:180, in merge_multimodal_embeddings(input_ids, inputs_embeds, multimodal_embeddings, placeholder_token_id) RuntimeError: shape mismatch: value tensor of shape [16128, 1024] cannot be broadcast to indexing result of shape [16128, 4096] The above exception was the direct cause of the following exception: RuntimeError Traceback (most recent call last) File ~/vllm_venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py:214, in LLM.init(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_context_len_to_capture, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, mm_processor_kwargs, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py:564, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers) File ~/vllm_venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py:339, in LLMEngine.init(self, model_config, cache_config, parallel_config, scheduler_config, device_config, load_config, lora_config, speculative_config, decoding_config, observability_config, prompt_adapter_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, use_cached_outputs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py:474, in LLMEngine._initialize_kv_caches(self) File ~/vllm_venv/lib/python3.12/site-packages/vllm/executor/distributed_gpu_executor.py:39, in DistributedGPUExecutor.determine_num_available_blocks(self) File ~/vllm_venv/lib/python3.12/site-packages/vllm/executor/multiproc_gpu_executor.py:185, in MultiprocessingGPUExecutor._run_workers(self, method, async_run_tensor_parallel_workers_only, max_concurrent_workers, *args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/worker/worker.py:223, in Worker.determine_num_available_blocks(self) File ~/vllm_venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/worker/model_runner.py:1236, in GPUModelRunnerBase.profile_run(self) File ~/vllm_venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs) File ~/vllm_venv/lib/python3.12/site-packages/vllm/worker/model_runner_base.py:152, in dump_input_when_exception.._inner.._wrapper(*args, **kwargs) RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20240930-231823.pkl): shape mismatch: value tensor of shape [16128, 1024] cannot be broadcast to indexing result of shape [16128, 4096] |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
err_execute_model_input_20240930-213352.pkl.zip
🐛 Describe the bug
Sampel code:
from vllm import LLM, SamplingParams
llm = LLM(model="adept/fuyu-8b", tensor_parallel_size=4, pipeline_parallel_size=1)
This sample code throws the following error on an instance with 4 A10G GPUs.
RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20240930-231823.pkl): shape mismatch: value tensor of shape [16128, 1024] cannot be broadcast to indexing result of shape [16128, 4096]
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: