Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does GroundingDINO support batched inference? #32206

Closed
royvelich opened this issue Jul 25, 2024 · 9 comments · May be fixed by #32490
Closed

Does GroundingDINO support batched inference? #32206

royvelich opened this issue Jul 25, 2024 · 9 comments · May be fixed by #32490

Comments

@royvelich
Copy link

It seems like grounding-dino states in the documentation that it can take a batch of images, but when I try to do so, I get an error, as specified here - https://discuss.huggingface.co/t/how-to-perform-batch-inference-on-groundingdino-model/90940.
Is it supposed to work?

@qubvel
Copy link
Member

qubvel commented Jul 25, 2024

Hi @royvelich, thanks for the question, would be nice to have minimal reproducing example and environment 🙂

I was able to run a batched inference with the following env and code:

- `transformers` version: 4.44.0.dev0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.4.0+cu118 (True)
- GPU type: NVIDIA A10G
import requests

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection

model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id).to(device)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
    "a cat. a remote control.",
    "a cat. a remote control. a sofa.",
]

inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

w, h = image.size
results = processor.post_process_grounded_object_detection(
    outputs,
    inputs.input_ids,
    box_threshold=0.4,
    text_threshold=0.3,
    target_sizes=[(h, w), (h, w)],
)
print(results)

@royvelich
Copy link
Author

@qubvel When I supply a batch of images, should all the images have the same resolution?

@qubvel
Copy link
Member

qubvel commented Aug 5, 2024

Hi @royvelich, the processor will take care of this. The above example works even if we provide images with different sizes for processor.

...
images = [image.resize((512, 256)), image.resize((256, 256))] 
...
inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)
...

@royvelich
Copy link
Author

@qubvel
Hi,
Thanks for your support. For some reason, when I run outputs = model(**inputs), I get the following error:

TypeError: GroundingDinoForObjectDetection.forward() missing 1 required positional argument: 'input_ids'

I checked it, and input.input_ids does not exist.

Do you have any idea what I should do?

Thanks!

@royvelich
Copy link
Author

@qubvel Hi, Thanks for your support. For some reason, when I run outputs = model(**inputs), I get the following error:

TypeError: GroundingDinoForObjectDetection.forward() missing 1 required positional argument: 'input_ids'

I checked it, and input.input_ids does not exist.

Do you have any idea what I should do?

Thanks!

Wait, let me check something. It works in your example, but I get this error when I integrate it into my project.

@royvelich
Copy link
Author

@qubvel
Ok, now it works. But I wonder - previously, I used a pipeline for running grounding-dino:

detector = pipeline(model="IDEA-Research/grounding-dino-tiny", task="zero-shot-object-detection", device=device)

Can we work in batches there as well? Also, it looks like the boxes that the pipeline returns are different from the boxes that I get using your code (using the same images/labels/hyper-parameters). Is it just a different format?

@qubvel
Copy link
Member

qubvel commented Aug 6, 2024

Hi @royvelich, indeed there is a bug in the pipeline for object-detection, it was reported previously in this issue:

@royvelich
Copy link
Author

This one should work for pipeline:

import requests
from PIL import Image
from transformers import pipeline

device = "cuda"

detector = pipeline(model="IDEA-Research/grounding-dino-tiny", task="zero-shot-object-detection", device=device)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
    "a cat. a remote control.",
    "a cat. a remote control. a sofa.",
]

data = [{'image': image, 'candidate_labels': text} for image, text in zip(images, texts)]

results = detector(data)

print(results)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants