Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVA OV: fix unpadding precision #34779

Merged
merged 3 commits into from
Nov 20, 2024

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Nov 18, 2024

What does this PR do?

Fixes #34625. There was a small precision error in unpadding because the modeling code casts the size to list, while processing code works with tensors. This PR casts everything to list to match the calculations

@zucchini-nlp zucchini-nlp requested a review from qubvel November 18, 2024 11:20
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Just a question re types

@zucchini-nlp
Copy link
Member Author

sorry @qubvel , wrong tag for review

@qubvel qubvel removed their request for review November 18, 2024 14:40
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, does it fix #34625 entirely? (let's close it if so!)

@zucchini-nlp zucchini-nlp merged commit 145fbd4 into huggingface:main Nov 20, 2024
10 checks passed
@Scyther-07
Copy link

Hey @zucchini-nlp,
I am running the following code but it is throwing me a TypeError:

from transformers import AutoProcessor, LlavaForConditionalGeneration, BitsAndBytesConfig, LlavaNextProcessor
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
processor = AutoProcessor.from_pretrained(model_id)

Error:
TypeError: LlavaNextProcessor.__init__() got an unexpected keyword argument 'image_token'

Now, I looked at the commits of this PR and it looks good to me. The issue might be with your last PR #33424. Kindly look into this.

@zucchini-nlp
Copy link
Member Author

@Scyther-07 hey, which transformers version you are using?

@Scyther-07
Copy link

@Scyther-07 hey, which transformers version you are using?

It's 4.39.3.
I just noticed that the above code works fine on Google Colab but throws an error in the Kaggle Notebook. I don't know what to make of it. I think I should shift to Colab.

@zucchini-nlp
Copy link
Member Author

@Scyther-07 hmm, the 4.39.3 should throw error indeed and you need at least v4.43 to bypass the error. In fact we are currently changing the way inputs for VLMs are processed, thus I'd recommend to use the latest transformers after release. It will be v4.47 around next 1-2 weeks, not released yet now :)

@Scyther-07
Copy link

Yeah, it worked. Too foolish of me. Thanks for the help!

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LLaVA-OneVision mismatch between image features and image tokens
5 participants