add multi gpu image captioning #3144

dame-cell · 2024-10-08T14:13:53Z

What does this PR do?

This PR adds a distributed image captioning example using blip.

This was discussed on the #3078

I was able to generate image captions using two T4 GPUs in less than 20 minutes for a dataset of 50,000 rows.
If there are any concerns regarding the choice of model or dataset, I am open to making adjustments as needed.

sayakpaul · 2024-10-08T14:37:12Z

Thanks for your contributions! But we already have #3123. Does it not have an overlap in the sense that it also adds captioning?

dame-cell · 2024-10-08T14:41:17Z

oh my bad i thought that was like for inference with florence

dame-cell · 2024-10-08T14:43:09Z

@sayakpaul i guess the only thing left if for the Speech data generation i will try contributing this if no one has done it

sayakpaul · 2024-10-08T14:49:47Z

Sure, speech-data generation would be very nice.

dame-cell added 6 commits October 8, 2024 13:33

adding skeleton code

a7f71c7

print statements

67cef98

added logging

6715b16

added logging

e8ebf69

final commit

e3db099

remove some print statement and some comments

58da6c1

dame-cell closed this Oct 8, 2024

dame-cell reopened this Oct 8, 2024

dame-cell closed this Oct 8, 2024