Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Detection Pipeline only outputs first element when batching #31356

Open
4 tasks
simonschoenhofen opened this issue Jun 10, 2024 · 13 comments
Open
4 tasks
Labels
Core: Pipeline Internals of the library; Pipeline. Good First Issue Vision

Comments

@simonschoenhofen
Copy link

simonschoenhofen commented Jun 10, 2024

System Info

  • transformers version: 4.41.2
  • Platform: Linux-5.4.0-182-generic-x86_64-with-glibc2.31
  • Python version: 3.11.8
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.2
  • Accelerate version: 0.30.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

When running the ObjectDetectionPipeline in a batch, the output will only be the bounding boxes of the first input image due to ObjectDetectionPipeline.py accessing element [0] in postprocessing and not looping over all outputs.

raw_annotation = raw_annotations[0]

This accesses only and always the first element, instead of looping over all outputs.

Only the first element is accessed in postprocessing

Who can help?

@Narsil

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Create object detection Pipeline

pipe = pipeline("object-detection", model=model_name, image_processor=preprocessor_name, device=device)

  1. use batch inference with batch_size > 2.
    for out in tqdm(pipe(dataset, batch_size=batch_size)):

Expected behavior

Expected Output: 2 Elements with each x items (bboxes).
Actual Output, only bboxes of the first input element.

@simonschoenhofen
Copy link
Author

It looks like that the output of the _forward function is correct (batch).
However, the input of postprocess is not a batch anymore, it is only the first element of the batch

@NielsRogge
Copy link
Contributor

cc @qubvel

@amyeroberts
Copy link
Collaborator

Hi @simonschoenhofen, thanks for reporting this!

Would you like to open a PR to address this?

@amyeroberts amyeroberts added Core: Pipeline Internals of the library; Pipeline. Vision labels Jun 10, 2024
@simonschoenhofen
Copy link
Author

@amyeroberts Will do tomorrow

@huggingface huggingface deleted a comment from github-actions bot Jul 11, 2024
@huggingface huggingface deleted a comment from github-actions bot Aug 5, 2024
@amyeroberts
Copy link
Collaborator

Adding a good first issue label in case anyone from the community wants to add this

@qubvel
Copy link
Member

qubvel commented Aug 6, 2024

hmm.. I was not able to reproduce the bug, the following example works fine. @simonschoenhofen were you able to solve this issue?

from transformers import pipeline

url = 'http://images.cocodataset.org/val2017/000000039769.jpg' 
pipe = pipeline("object-detection", model="PekingU/rtdetr_r50vd", device="cuda")

results = pipe([url] * 4, batch_size=2)

for i, result in enumerate(results):
    print(f"Image {i}:\n{result}\n")
Image 0:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 1:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 2:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 3:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

@royvelich
Copy link

@qubvel
Well, apparently, for grounding-dino, you have to do something like this in order to run batches in pipeline:

import requests
from PIL import Image
from transformers import pipeline

device = "cuda"

detector = pipeline(model="IDEA-Research/grounding-dino-tiny", task="zero-shot-object-detection", device=device)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
    "a cat. a remote control.",
    "a cat. a remote control. a sofa.",
]

data = [{'image': image, 'candidate_labels': text} for image, text in zip(images, texts)]

results = detector(data)

print(results)

@qubvel
Copy link
Member

qubvel commented Aug 6, 2024

@royvelich Yes this example works, but the results are weird and do not match with results running the model outside of the pipeline. While I suspect that "object-detection" pipeline doesn't have an issue, it looks like "zero-shot-object-detection" pipeline is not working properly with grounding dino

@royvelich
Copy link

@qubvel So, do you recommend using your example for now and avoiding the pipeline?

@qubvel
Copy link
Member

qubvel commented Aug 6, 2024

@royvelich yes, please, use grounding dino model, not a pipeline, while we investigating the issue

@shankram
Copy link

@qubvel the postprocess function in the zero_shot_object_detection pipeline calls image_processor.post_process_object_detection:

def postprocess(self, model_outputs, threshold=0.1, top_k=None):
        results = []
        for model_output in model_outputs:
            label = model_output["candidate_label"]
            model_output = BaseModelOutput(model_output)
            outputs = self.image_processor.post_process_object_detection(
                outputs=model_output, threshold=threshold, target_sizes=model_output["target_size"]
            )[0]
        ...

while the grounding dino hf post says post_process_grounded_object_detection should be used when the text has multiple classes (eg. 'a cat. a remote.').

The post_process_grounded_object_detection method requires input_ids which aren't passed to the postprocess function in the pipeline. Could you please tell me how to fix this without breaking the pipeline for other models? I'd be happy to open a PR and make my first contribution 🤗

@qubvel
Copy link
Member

qubvel commented Aug 29, 2024

Hi @shankram, thank you for investigating this! Indeed there is a problem with the pipeline for zero-shot object detection for some models.

I've prepared a PR fixing a pipeline, its not yet merged, but already functional

@tejuiceB
Copy link

tejuiceB commented Jan 19, 2025

Hi! I would like to work on fixing this issue. I plan to modify the postprocessing step to ensure that all images in the batch are processed correctly.

I understand that the issue arises from the code accessing only the first element of the batch during postprocessing. To confirm, the expected behavior is to return bounding boxes for all images in the batch, not just the first one, correct? Any additional context or insights would be greatly appreciated!

My approach will be to loop through all elements in the batch within the postprocessing function to ensure that each image's bounding boxes are processed and returned. I will work on the changes and submit a PR once it's ready.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Pipeline Internals of the library; Pipeline. Good First Issue Vision
Projects
None yet
Development

No branches or pull requests

7 participants