-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pipelines] Problems with an image-to-text fine-tuned model #21514
Comments
I'm not well versed with Pipelines are usually agnostic to actual models. As long as model X is If the architecture is different, we can discuss what's done and how to implement.
|
Fair enough! I guess the main reason the pipeline is acting weird could be that the model is loaded into |
Yes that's exactly it. In the absence of tags the hub will check the config and assign a pipeline based on architecture format |
Do you have a sample script to make it work for captionning ? |
If you check the Colab Notebook I linked above, you will see it at the end (the inference section). |
The pipeline currently only supports classes that are instances of
|
Seems to me that the colab does pretty much what the pipeline does: Any reason not to implement |
Yeah that is what my understanding is as well. Maybe @NielsRogge can provide more on
|
The image-to-text pipeline currently only supports the But in practice, |
So make it |
I've added BLIP and BLIP-2 to the ForVision2Seq mapping, making them usable with the image-to-text pipeline: #21802. However, GIT can't be added out-of-the-box, due to |
What are those If they are |
GIT is a bit special in the sense that it can be viewed as a GPT-2 model, taking
|
Shouldn't we implement |
No |
IT seems No ? |
Yes correct, But the problem is here. The inputs, prepared using the image processor, will be |
Oh this code is already not looking pretty, there could be a way to make it better. But we could always add GitForVision2Seq(GitForCausalLM):
def forward(self, pixel_values, ***):
return super().formward(self, pixel_values=pixel_values) for isntance? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I have removed the Good first issue label as there is no clear plan explaining to a beginner what to do to solve this issue. Please add this if you want to re-put that label. |
If @Narsil agrees, we can add the |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
To fix this issue, one can:
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This was fixed by #23362 |
I have fine-tuned the microsoft/git-base model on image cpationing (Colab Notebook).
I am trying to use the model with 🤗 Pipelines:
It only spits:
If you check the Colab Notebook, you will notice that it works okay when the inference is performed explicitly i.e., without pipelines. Is it because the architecture tagged with the model is
GitForCausalLM
?Also note that on the model repo, there is a tag "Image To Text" WHICH I HAVE MANUALLY ADDED to see if that has any effect. By default, the model gets tagged as a text generation model.
@Narsil is it out of scope to support this model in an image to text generation pipeline?
The text was updated successfully, but these errors were encountered: