Can I use VCD for POPE benchmark? #31

Tizzzzy · 2025-01-30T04:53:51Z

Hi,
I am currently working on applying the VCD mitigation method to the LLaVA model on the POPE benchmark. The questions in the POPE benchmark are all binary (Yes/No), for example:

{"question_id": 1, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a person in the image?", "label": "yes"}
{"question_id": 2, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a refrigerator in the image?", "label": "no"}

Is it possible to use VCD to further improve LLaVA's performance on these types of binary questions?
If so, could you provide guidance on how to implement it?

Below is the current implementation I am using for LLaVA to generate answers from an image:

model_id = "llava-hf/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": question_text},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=20, do_sample=False)
answer_text = processor.decode(output[0][2:], skip_special_tokens=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use VCD for POPE benchmark? #31

Can I use VCD for POPE benchmark? #31

Tizzzzy commented Jan 30, 2025 •

edited

Loading

Can I use VCD for POPE benchmark? #31

Can I use VCD for POPE benchmark? #31

Comments

Tizzzzy commented Jan 30, 2025 • edited Loading

Tizzzzy commented Jan 30, 2025 •

edited

Loading