You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am currently working on applying the VCD mitigation method to the LLaVA model on the POPE benchmark. The questions in the POPE benchmark are all binary (Yes/No), for example:
{"question_id": 1, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a person in the image?", "label": "yes"}
{"question_id": 2, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a refrigerator in the image?", "label": "no"}
Is it possible to use VCD to further improve LLaVA's performance on these types of binary questions?
If so, could you provide guidance on how to implement it?
Below is the current implementation I am using for LLaVA to generate answers from an image:
Hi,
I am currently working on applying the VCD mitigation method to the LLaVA model on the POPE benchmark. The questions in the POPE benchmark are all binary (Yes/No), for example:
Is it possible to use VCD to further improve LLaVA's performance on these types of binary questions?
If so, could you provide guidance on how to implement it?
Below is the current implementation I am using for LLaVA to generate answers from an image:
The text was updated successfully, but these errors were encountered: