Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: qwen2vl very slow when preprocess large image #9238

Closed
1 task done
zhjunqin opened this issue Oct 10, 2024 · 6 comments · Fixed by #9818
Closed
1 task done

[Performance]: qwen2vl very slow when preprocess large image #9238

zhjunqin opened this issue Oct 10, 2024 · 6 comments · Fixed by #9818
Labels
performance Performance-related issues

Comments

@zhjunqin
Copy link

Proposal to improve performance

No response

Report of performance regression

build with latest vllm code and start Qwen2-VL-7B-Instruct

image

It takes too long time to handle preprocess lead to heartbeat timeout.

ERROR 10-10 01:14:54 client.py:250] RuntimeError('Engine loop has died')
ERROR 10-10 01:14:54 client.py:250] Traceback (most recent call last):
ERROR 10-10 01:14:54 client.py:250] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 10-10 01:14:54 client.py:250] await self._check_success(
ERROR 10-10 01:14:54 client.py:250] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 10-10 01:14:54 client.py:250] raise response
ERROR 10-10 01:14:54 client.py:250] RuntimeError: Engine loop has died

ERROR 10-10 01:25:08 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 10-10 01:25:08 client.py:250] NoneType: None
DEBUG 10-10 01:25:08 client.py:144] Shutting down MQLLMEngineClient check health loop due to timeout
DEBUG 10-10 01:25:14 client.py:170] Waiting for output from MQLLMEngine.
CRITICAL 10-10 01:25:14 launcher.py:99] MQLLMEngine is already dead, terminating server process

Any suggestion to help improve preprocess preformance?

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@zhjunqin zhjunqin added the performance Performance-related issues label Oct 10, 2024
@zhjunqin
Copy link
Author

code in transformers
https://github.com/huggingface/transformers/blob/main/src/transformers/image_transforms.py#L97

def rescale(
    image: np.ndarray,
    scale: float,
    data_format: Optional[ChannelDimension] = None,
    dtype: np.dtype = np.float32,
    input_data_format: Optional[Union[str, ChannelDimension]] = None,
) -> np.ndarray:
    """
    Rescales `image` by `scale`.

    Args:
        image (`np.ndarray`):
            The image to rescale.
        scale (`float`):
            The scale to use for rescaling the image.
        data_format (`ChannelDimension`, *optional*):
            The channel dimension format of the image. If not provided, it will be the same as the input image.
        dtype (`np.dtype`, *optional*, defaults to `np.float32`):
            The dtype of the output image. Defaults to `np.float32`. Used for backwards compatibility with feature
            extractors.
        input_data_format (`ChannelDimension`, *optional*):
            The channel dimension format of the input image. If not provided, it will be inferred from the input image.

    Returns:
        `np.ndarray`: The rescaled image.
    """
    if not isinstance(image, np.ndarray):
        raise TypeError(f"Input image must be of type np.ndarray, got {type(image)}")

    rescaled_image = image * scale  # take long time for large image
    if data_format is not None:
        rescaled_image = to_channel_dimension_format(rescaled_image, data_format, input_data_format)

    rescaled_image = rescaled_image.astype(dtype)

    return rescaled_image

rescaled_image = image * scale # takes long time for large image

@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 10, 2024

It is recommended to use qwen_vl_utils (as shown here) to preprocess the images before passing them into vLLM.

@zhjunqin
Copy link
Author

It is recommended to use qwen_vl_utils (as shown here) to preprocess the images before passing them into vLLM.

I'm not sure how it work. After preprocess by qwen_vl_utils, image becomes to a tensor, then send to vLLM?

@DarkLight1337
Copy link
Member

It should resize the images to be suitable to be used by the model. So, if you input an image that is too large, it should be resized.

@zhjunqin
Copy link
Author

It should resize the images to be suitable to be used by the model. So, if you input an image that is too large, it should be resized.

Take 1080p (1980 x 1080) as limit picture size, it still takes more than 10s to preprocess.

@DarkLight1337
Copy link
Member

Since the processing time is contained inside HuggingFace, I guess it's not really our fault... even if our engine didn't timeout, you would still get very poor performance. Perhaps open an issue on HuggingFace side? cc @fyabc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants