Skip to content

Latest commit

 

History

History
64 lines (41 loc) · 3.46 KB

audio_lm.md

File metadata and controls

64 lines (41 loc) · 3.46 KB

Audio Language Model

Papers

  • Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model, arXiv, 2501.07246, arxiv, pdf, cication: -1

    Ziyang Ma, Zhuo Chen, Yuping Wang, ..., Eng Siong Chng, Xie Chen

  • State-Space Large Audio Language Models, arXiv, 2411.15685, arxiv, pdf, cication: -1

    Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, ..., Rogerio Feris, James Glass

  • 🌟 Scaling Speech-Text Pre-training with Synthetic Interleaved Data, arXiv, 2411.17607, arxiv, pdf, cication: -1

    Aohan Zeng, Zhengxiao Du, Mingdao Liu, ..., Yuxiao Dong, Jie Tang

  • 🌟 A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models, arXiv, 2411.08742, arxiv, pdf, cication: -1

    Dingdong Wang, Mingyu Cui, Dongchao Yang, ..., Xueyuan Chen, Helen Meng

  • Roadmap towards Superhuman Speech Understanding using Large Language Models, arXiv, 2410.13268, arxiv, pdf, cication: -1

    Fan Bu, Yuhao Zhang, Xidong Wang, ..., Qun Liu, Haizhou Li

Survey

  • Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey, arXiv, 2412.06602, arxiv, pdf, cication: -1

    Tianxin Xie, Yan Rong, Pengfei Zhang, ..., Li Liu

  • Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks, arXiv, 2411.05361, arxiv, pdf, cication: -1

    Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, ..., Shinji Watanabe, Hung-yi Lee

Evaluation

  • What Do Speech Foundation Models Not Learn About Speech?, arXiv, 2410.12948, arxiv, pdf, cication: -1

    Abdul Waheed, Hanin Atwany, Bhiksha Raj, ..., Rita Singh

  • Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models, arXiv, 2410.23861, arxiv, pdf, cication: -1

    Hao Yang, Lizhen Qu, Ehsan Shareghi, ..., Gholamreza Haffari

  • MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, arXiv, 2410.19168, arxiv, pdf, cication: -1

    S Sakshi, Utkarsh Tyagi, Sonal Kumar, ..., Sreyan Ghosh, Dinesh Manocha · (sakshi113.github)

Projects

Toolkits

Misc

Misc