-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does GroundingDINO support batched inference? #32206
Comments
Hi @royvelich, thanks for the question, would be nice to have minimal reproducing example and environment 🙂 I was able to run a batched inference with the following env and code:
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id).to(device)
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
"a cat. a remote control.",
"a cat. a remote control. a sofa.",
]
inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
w, h = image.size
results = processor.post_process_grounded_object_detection(
outputs,
inputs.input_ids,
box_threshold=0.4,
text_threshold=0.3,
target_sizes=[(h, w), (h, w)],
)
print(results) |
@qubvel When I supply a batch of images, should all the images have the same resolution? |
Hi @royvelich, the processor will take care of this. The above example works even if we provide images with different sizes for processor. ...
images = [image.resize((512, 256)), image.resize((256, 256))]
...
inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
... |
@qubvel
I checked it, and Do you have any idea what I should do? Thanks! |
Wait, let me check something. It works in your example, but I get this error when I integrate it into my project. |
@qubvel
Can we work in batches there as well? Also, it looks like the boxes that the pipeline returns are different from the boxes that I get using your code (using the same images/labels/hyper-parameters). Is it just a different format? |
Hi @royvelich, indeed there is a bug in the |
This one should work for pipeline:
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
It seems like grounding-dino states in the documentation that it can take a batch of images, but when I try to do so, I get an error, as specified here - https://discuss.huggingface.co/t/how-to-perform-batch-inference-on-groundingdino-model/90940.
Is it supposed to work?
The text was updated successfully, but these errors were encountered: