Improve image processing time #33810

yonigozlan · 2024-09-30T08:41:08Z

Feature request

Optimize Transformers' image_processors to decrease image processing time, and reduce inference latency for vision models and vlms.

Motivation

The Transformers library relies on PIL (Pillow) for image preprocessing, which can become a major bottleneck during inference, especially with compiled models where the preprocessing time can dominate the overall inference time.

In the examples above, the RT-DETR preprocessing necessitates only to resize the image, while the DETR one involves resize+normalize.
In eager mode, image preprocessing takes a big part of the total inference time for RT-DETR, but is not the main bottleneck. However, with a compiled RT-DETR, image preprocessing takes up the majority of the inference time, underlining the necessity to optimize it. This is even clearer for DETR, where image preprocessing is already the main bottleneck in eager mode.

However, alternative libraries exist that leverage available hardware more efficiently for faster image preprocessing.
OptimVision uses such libraries to get much better results compared to Transformers.

Much more details on OptimVision and image processing methods comparison are available on this Notion page.

Your contribution

OptimVision is an experiment playground to optimize the different steps involved in inferring/training with vision models.
The current fast image preprocessing in OptimVision is a proof of concept and is not yet ready to be merged into Transformers, but that this the ultimate goal :).

LysandreJik · 2024-09-30T19:46:00Z

Sounds like a good project indeed :)

Gladiator07 · 2025-01-16T11:07:44Z

Hi, any updates on making Qwen2VLProcessorFast, as we are experiencing a major bottleneck because of the preprocessing time in offline mode

yonigozlan · 2025-01-16T15:19:56Z

Answered here #34272 (comment)

yonigozlan added Feature request Request for a new feature Vision optimization labels Sep 30, 2024

yonigozlan mentioned this issue Oct 10, 2024

Add DetrImageProcessorFast #34063

Merged

zucchini-nlp mentioned this issue Oct 21, 2024

image_transforms preprocess quite slow when run large image with qwen2vl #34272

Closed

4 tasks

This was referenced Oct 23, 2024

Add Image Processor Fast Deformable DETR #34353

Merged

Add Image Processor Fast RT-DETR #34354

Merged

SinanAkkoyun mentioned this issue Oct 28, 2024

[Bug]: Qwen2-VL incoherent output with OpenAI API vllm-project/vllm#9732

Closed

qubvel added the Processing label Oct 31, 2024

DarkLight1337 mentioned this issue Nov 7, 2024

[RFC]: Merge input processor and input mapper for multi-modal models vllm-project/vllm#10114

Open

33 tasks

mgoin mentioned this issue Nov 20, 2024

Add optimized PixtralImageProcessorFast #34836

Merged

5 tasks

yonigozlan mentioned this issue Jan 16, 2025

add Qwen2-VL image processor fast #35733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve image processing time #33810

Improve image processing time #33810

yonigozlan commented Sep 30, 2024

LysandreJik commented Sep 30, 2024

Gladiator07 commented Jan 16, 2025

yonigozlan commented Jan 16, 2025

Improve image processing time #33810

Improve image processing time #33810

Comments

yonigozlan commented Sep 30, 2024

Feature request

Motivation

Your contribution

LysandreJik commented Sep 30, 2024

Gladiator07 commented Jan 16, 2025

yonigozlan commented Jan 16, 2025