This repository contains two knowledge-intensive audio question answering datasets: Single Audio Question Answering (sAQA) and Multi-Audio Question Answering (mAQA) introduced in our paper 'Audiopedia: Audio Question Answering with Knowledge' which got accepted into ICASSP 2025.
- Task: Answer knowledge-intensive questions about named entity mentioned in a single audio clip.
- Answer type: Open-ended.
- Format: Single audio input per question.
- Task: Answer questions requiring reasoning over named entities across multiple audio clips.
- Answer type: Binary (Yes/No).
- Format: List of audio inputs (2) per question.
{
"audio_file": "sAQA/aud_files/sentence_id.wav",
"question": "question",
"id": "ques_id",
"answer": "answer"
}
{
"audio_files": [
"mAQA/aud_files/sentence_id_0.wav",
"mAQA/aud_files/sentence_id_1.wav"
],
"question": "question",
"id": "ques_id",
"answer": "answer"
}
audiopedia/
├── sAQA_release_qa.json
├── mAQA_release_qa.json
├── sAQA/
│ └── aud_files/
│ └── *.wav
└── mAQA/
└── aud_files/
└── *.wav
- sAQA audio files: Link.
- mAQA audio files: Link.
- Format: WAV
- Generation: Tacotron 2 text-to-speech model.
- Each audio sample contains only one named entity mention.
- All mentioned entities are in English.
- Less than 5% of samples may contain noise.
- Dataset is synthetic in nature.
@article{penamakuri2024audiopedia,
author = {Abhirama Subramanyam Penamakuri, Kiran Chhate and Akshat Jain},
title = {Audiopedia: Audio QA with Knowledge},
journal = {ICASSP},
year = {2025},
}
- Abhirama Subramanyam Penamakuri ([email protected])
- Kiran Chhatre ([email protected])
- Akshat Jain ([email protected])
The data is released under the MIT license.
The knowledge base used for dataset creation is from TextKVQA.