Skip to content

Abhiram4572/Audiopedia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audiopedia: Audio Question Answering with Knowledge (ICASSP 2025)

This repository contains two knowledge-intensive audio question answering datasets: Single Audio Question Answering (sAQA) and Multi-Audio Question Answering (mAQA) introduced in our paper 'Audiopedia: Audio Question Answering with Knowledge' which got accepted into ICASSP 2025.

Dataset Overview

sAQA (Single Audio Question Answering)

  • Task: Answer knowledge-intensive questions about named entity mentioned in a single audio clip.
  • Answer type: Open-ended.
  • Format: Single audio input per question.

mAQA (Multi-Audio Question Answering)

  • Task: Answer questions requiring reasoning over named entities across multiple audio clips.
  • Answer type: Binary (Yes/No).
  • Format: List of audio inputs (2) per question.

Data Format

sAQA Format

{
    "audio_file": "sAQA/aud_files/sentence_id.wav",
    "question": "question",
    "id": "ques_id",
    "answer": "answer"
}

mAQA Format

{
    "audio_files": [
        "mAQA/aud_files/sentence_id_0.wav",
        "mAQA/aud_files/sentence_id_1.wav"
    ],
    "question": "question",
    "id": "ques_id",
    "answer": "answer"
}

Directory Structure

audiopedia/
├── sAQA_release_qa.json
├── mAQA_release_qa.json
├── sAQA/
│   └── aud_files/
│       └── *.wav
└── mAQA/
    └── aud_files/
        └── *.wav

Audio Files

  • sAQA audio files: Link.
  • mAQA audio files: Link.
  • Format: WAV
  • Generation: Tacotron 2 text-to-speech model.

Limitations

  • Each audio sample contains only one named entity mention.
  • All mentioned entities are in English.
  • Less than 5% of samples may contain noise.
  • Dataset is synthetic in nature.

Citation

@article{penamakuri2024audiopedia,
  author    = {Abhirama Subramanyam Penamakuri, Kiran Chhate and Akshat Jain},
  title     = {Audiopedia: Audio QA with Knowledge},
  journal   = {ICASSP},
  year      = {2025},
}

Contact

License

The data is released under the MIT license.

Acknowledgments

The knowledge base used for dataset creation is from TextKVQA.

About

Audiopedia: Audio QA with Knowledge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published