Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering
In this part, we:
- Transform images into captions then Scene Graphs;
- Extract the topic entities in the questions;
- Construct the Concept Graphs based on previous outputs.
Please check the implementation details in Localized MiniGPT-4.