Qwen2-VL: clean-up and add more tests #33354

zucchini-nlp · 2024-09-06T12:19:04Z

What does this PR do?

This PR adds standardization on Qwen2-VL processors and adds tests for processing and generation. One thing to note is that generation tests currently will operate on text-only modality. I will add support of multimodal tests very soon

TODO:

add video processor tests

HuggingFaceDocBuilderDev · 2024-09-06T12:39:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for adding this!

amyeroberts · 2024-09-09T10:04:18Z

src/transformers/models/qwen2_vl/processing_qwen2_vl.py

-                Maximum length of the returned list and optionally padding length (see above).
-            truncation (`bool`, *optional*):
-                Activates truncation to cut input sequences longer than `max_length` to `max_length`.
-            return_tensors (`str` or [`~utils.TensorType`], *optional*):


Let's keep the return tensors in the docstring for now. Although we're removing it as a defined kwarg, users need to set to to be able to properly use the processor for training.

tests/models/qwen2_vl/test_processing_qwen2_vl.py

amyeroberts · 2024-09-09T10:07:44Z

tests/models/qwen2_vl/test_processing_qwen2_vl.py

+        all_kwargs = {
+            "common_kwargs": {"return_tensors": "pt"},
+            "images_kwargs": {"size": {"height": 214, "width": 214}},
+            "videos_kwargs": {"size": {"height": 214, "width": 214}},


I'm guessing the processing doesn't work if the video and image have different image sizes set as we can't batch?

Could we test passing one and not the other?

yes, passing only one worked for me but I'll add better tests also

Oke, I found that the size wasn't being used at all, my bad for missing that when reviewing. Since Qwen2-VL can't resize to the given size (it needs to follow some methods to smart resize and keep as much resolution as possible), the size dict will be different from other models. Added that to the docsting

Also modified all tests to follow the new size format and actually use if if users pass it with correct dict-keys. WDYT? Should we also find a way to support tuple-size? I'll also update model doc page soon

updated docs and added an example usage where users can change max-min resolution in processor's call-time

Should we also find a way to support tuple-size?

No. Previously size was a tuple and this caused issues as:

some models had it stored as (width, height)

some models just stored an int. Within that set of models, some would then resize to (int, int) others to (int, int * height / width) etc.

some models used a tuple to denote the longest and shortest edges

There was a huge level of ambiguity and inconsistency across the configs which made image processor behaviour difficult to predict without having to dig into the code.

Technically, tuples are supported using get_size_dict, but this is for backwards compatibility rather than an encouraged way to pass in inputs.

Adding min_pixels and max_pixels isn't a light decision, as this is something which sets in stone how this will be expressed for all configs going forward. I think this change is OK but we should be cautious about just adding keys to size dict. You'll also need to update some other code which verifies the keys in the size dicts -- namely VALID_SIZE_DICT_KEYS

No. Previously size was a tuple and this caused issues as:

I see, that makes it easier in general to use max-min pixels now cause I didn't want to add the new possible keys to VALID_SIZE_DICT_KEYS. Qwen is and will prob be the only one accepting such kwargs. But then we couldn't call get_size_dict to validate the size dict is passed correctly

Oke, now I added it in VALID_DICT and added one more test with incorrect sizes passed as kwargs, in which case the defaults from self.min_pixels have to be used

Ready for re-review

Co-authored-by: amyeroberts <[email protected]>

tests/models/qwen2_vl/test_modeling_qwen2_vl.py

src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

Co-authored-by: amyeroberts <[email protected]>

amyeroberts

Left a comment re the size dict -- just some updates needed in image_processing_utils.py to reflect this addition

docs/source/en/model_doc/qwen2_vl.md

amyeroberts · 2024-09-12T14:50:35Z

tests/models/qwen2_vl/test_processing_qwen2_vl.py

+        all_kwargs = {
+            "common_kwargs": {"return_tensors": "pt"},
+            "images_kwargs": {"size": {"height": 214, "width": 214}},
+            "videos_kwargs": {"size": {"height": 214, "width": 214}},


Should we also find a way to support tuple-size?

No. Previously size was a tuple and this caused issues as:

some models had it stored as (width, height)

some models just stored an int. Within that set of models, some would then resize to (int, int) others to (int, int * height / width) etc.

some models used a tuple to denote the longest and shortest edges

There was a huge level of ambiguity and inconsistency across the configs which made image processor behaviour difficult to predict without having to dig into the code.

Technically, tuples are supported using get_size_dict, but this is for backwards compatibility rather than an encouraged way to pass in inputs.

Adding min_pixels and max_pixels isn't a light decision, as this is something which sets in stone how this will be expressed for all configs going forward. I think this change is OK but we should be cautious about just adding keys to size dict. You'll also need to update some other code which verifies the keys in the size dicts -- namely VALID_SIZE_DICT_KEYS

Co-authored-by: amyeroberts <[email protected]>

amyeroberts · 2024-09-12T15:26:12Z

src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

@@ -192,13 +196,14 @@ def __init__(
        self.patch_size = patch_size
        self.temporal_patch_size = temporal_patch_size
        self.merge_size = merge_size
-        self.size = {"min_pixels": min_pixels, "max_pixels": max_pixels}
+        self.size = size if size is not None else {"min_pixels": min_pixels, "max_pixels": max_pixels}


One last thing - if size can now be set in the input i.e. the values of min_pixels and max_pixels in the size dict are taken as precedence over the min_pixels and max_pixels input arguments, then we should pop max_pixels and min_pixels in the to_dict method so that only size is saved out into the config.

That is - if someone were to look at the config file, it's not clear which value takes precedence, and you'd have to look at the code to understand, which we should avoid

hmm, that's a good point. But I think the priority should be given to the max_pixel and min_pixels rather than size dict, which is the default setting for qwen2-vl

OK - what I'd recommend is just not accepting the size argument and sticking with min_pixels and max_pixels then

amyeroberts · 2024-09-12T15:31:16Z

src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

+                min_pixels = size.get("min_pixels", self.min_pixels)
+                max_pixels = size.get("max_pixels", self.max_pixels)


This is funny - it implies size can be a dict without the min_pixels and max_pixels values, but this isn't compatible with this image processor e.g. "height" and "width" from the size dict wouldn't be used. It would be better to raise an error if these keys are missing rather than default to self.min_pixels or self.max_pixels (which create ambiguity in the behaviour) and can mask proper initialization in the init

yea, this was the first thing I thought but didn't want to raise errors that seemed to run smoothly earlier. We can raise an error and explain how to pass size, and this is the place where we can say that passing as tuple is neither good (related to below comment). Since we added min/max pixels to VALID_DICT, i think we did it to be able to run get_size_dict and that way validate that no weird keys are used

I am pro checking the size dict, but we can also check it within Qwen2 code only and not add it in VALID_DICT maybe

amyeroberts · 2024-09-12T15:33:51Z

src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

@@ -382,6 +395,7 @@ def preprocess(
        """
        do_resize = do_resize if do_resize is not None else self.do_resize
        size = size if size is not None else self.size
+        size = get_size_dict(size)


This won't work - this will produce a dictionary with "height", "width"; "shortest_edge" or "shortest_edge", "longest_edge" keys which are not compatible with this image processor.

amyeroberts

Thanks for iterating on this and the clean up!

* clean-up on qwen2-vl and add generation tests * add video tests * Update tests/models/qwen2_vl/test_processing_qwen2_vl.py Co-authored-by: amyeroberts <[email protected]> * fix and add better tests * Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py Co-authored-by: amyeroberts <[email protected]> * update docs and address comments * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <[email protected]> * update * remove size at all --------- Co-authored-by: amyeroberts <[email protected]>

clean-up on qwen2-vl and add generation tests

b96c5dc

zucchini-nlp mentioned this pull request Sep 8, 2024

Track progress for VLMs refactoring #33374

Open

16 tasks

zucchini-nlp and others added 2 commits September 9, 2024 11:20

add video tests

a2c347d

Merge branch 'huggingface:main' into qwen2-vl

25c903d

zucchini-nlp requested review from LysandreJik and amyeroberts September 9, 2024 09:38

amyeroberts approved these changes Sep 9, 2024

View reviewed changes

Update tests/models/qwen2_vl/test_processing_qwen2_vl.py

db21416

Co-authored-by: amyeroberts <[email protected]>

hiyouga reviewed Sep 9, 2024

View reviewed changes

tests/models/qwen2_vl/test_modeling_qwen2_vl.py Outdated Show resolved Hide resolved

fix and add better tests

e08f4c1

amyeroberts reviewed Sep 10, 2024

View reviewed changes

src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py Outdated Show resolved Hide resolved

zucchini-nlp and others added 2 commits September 10, 2024 11:59

Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

32db55c

Co-authored-by: amyeroberts <[email protected]>

update docs and address comments

13c1c7c

amyeroberts reviewed Sep 12, 2024

View reviewed changes

amyeroberts self-requested a review September 12, 2024 14:52

zucchini-nlp and others added 4 commits September 12, 2024 16:57

Update docs/source/en/model_doc/qwen2_vl.md

a12d304

Co-authored-by: amyeroberts <[email protected]>

Update docs/source/en/model_doc/qwen2_vl.md

1f394e1

Co-authored-by: amyeroberts <[email protected]>

update

49a9217

Merge branch 'huggingface:main' into qwen2-vl

f49998e

amyeroberts reviewed Sep 12, 2024

View reviewed changes

remove size at all

a9feeca

amyeroberts approved these changes Sep 12, 2024

View reviewed changes

zucchini-nlp merged commit 2f611d3 into huggingface:main Sep 12, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-VL: clean-up and add more tests #33354

Qwen2-VL: clean-up and add more tests #33354

zucchini-nlp commented Sep 6, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 6, 2024

amyeroberts left a comment

amyeroberts Sep 9, 2024

amyeroberts Sep 9, 2024

zucchini-nlp Sep 9, 2024

zucchini-nlp Sep 10, 2024

zucchini-nlp Sep 12, 2024

amyeroberts Sep 12, 2024

zucchini-nlp Sep 12, 2024

amyeroberts left a comment

amyeroberts Sep 12, 2024

amyeroberts Sep 12, 2024 •

edited

Loading

zucchini-nlp Sep 12, 2024

amyeroberts Sep 12, 2024

amyeroberts Sep 12, 2024

zucchini-nlp Sep 12, 2024 •

edited

Loading

amyeroberts Sep 12, 2024

amyeroberts left a comment

		min_pixels = size.get("min_pixels", self.min_pixels)
		max_pixels = size.get("max_pixels", self.max_pixels)

Qwen2-VL: clean-up and add more tests #33354

Qwen2-VL: clean-up and add more tests #33354

Conversation

zucchini-nlp commented Sep 6, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 6, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Sep 6, 2024 •

edited

Loading

amyeroberts Sep 12, 2024 •

edited

Loading

zucchini-nlp Sep 12, 2024 •

edited

Loading