awesome-speaker-recognition

This is an attempt to list interesting speaker recognition/identification/verification research works.

Review/survey papers

Pre-Deep learning

Speaker Verification Using Adapted Gaussian Mixture Models, Reynolds et. al 2000 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.338&rep=rep1&type=pdf)
Front-end factor analysis for speaker verification, Dehak et. al 2010 (https://ieeexplore.ieee.org/document/5545402)
Channel robust speaker verification via feature mapping, Reynolds 2003, ICASSP (https://ieeexplore.ieee.org/abstract/document/1202292/)

Speech features

Multi-Channel Speaker Verification for Single and Multi-talker Speech, Kataria et. al 2021 (https://arxiv.org/abs/2010.12692)

Front-end

Speaker recognition from raw waveform with sincnet, Ravanelli et. al 2018 (https://arxiv.org/abs/1808.00158)

Back-end

Graph Attention Networks for Speaker Verification, Jung et. al 2020 (https://arxiv.org/abs/2010.11543)
Ferrer, Luciana, Mitchell McLaren, and Niko Brummer. "A Speaker Verification Backend with Robust Performance across Conditions." arXiv preprint arXiv:2102.01760 (2021). (https://arxiv.org/abs/2102.01760)
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?, Wang et. al 2022, Interspeech 2022 (https://www.isca-speech.org/archive/interspeech_2022/wang22r_interspeech.html)

Architectures

Ding, Shaojin, et al. "Autospeech: Neural architecture search for speaker recognition, Ding et. al 2020 (https://arxiv.org/abs/2005.03215)
"Pushing the limits of raw waveform speaker recognition", Jee-weon Jung et. al 2022 (https://arxiv.org/abs/2203.08488)

Pooling

Exploring the encoding layer and loss function in end-to-end speaker and language recognition system, Cai et. al 2018 (https://arxiv.org/abs/1804.05160)
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition, Xiang et. al 2019 (https://arxiv.org/abs/1906.07317)

(Towards?) End-to-end

Garcia-Romero, Daniel, Gregory Sell, and Alan McCree. "Magneto: X-vector magnitude estimation network plus offset for improved speaker recognition." Proc. Odyssey 2020 The Speaker and Language Recognition Workshop. 2020. (https://www.isca-speech.org/archive/Odyssey_2020/pdfs/65.pdf)

Representation learning

Deep Speaker: an End-to-End Neural Speaker Embedding System, Li et. al 2017 (https://arxiv.org/abs/1705.02304)

With self-supervised learning

Learning Speaker Embedding with Momentum Contrast, Ding et. al 2020 (https://arxiv.org/abs/2001.01986)
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing, Chen et al. 2021 (https://arxiv.org/abs/2110.13900)

With speaker diarization

With speech enhancement

VoiceID Loss: Speech Enhancement for Speaker Verification, Shon et. al 2019 (https://arxiv.org/abs/1904.03601)
Feature enhancement with deep feature losses for speaker verification, Kataria et. al 2019 (https://arxiv.org/abs/1910.11905)

With domain adaptation

Cycle-gans for domain adaptation of acoustic features for speaker recognition, Nidadavolu et. al 2019 (https://ieeexplore.ieee.org/document/8683055)

Joint learning

Multi-modal

A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data, Abdrakhmanova et. al 2021 (https://arxiv.org/abs/2110.12136)
Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors, Shi et al. 2021 (https://dl.acm.org/doi/abs/10.1145/3447993.3483272)

Metrics

The bosaris toolkit: Theory, algorithms and code for surviving the new dcf, Brummer et al., 2013 (https://arxiv.org/abs/1304.2865)

System Descriptions

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021, Zhang et al., 2021 (https://arxiv.org/abs/2109.03568)

Miscellaneous

Datasets

Fan, Yue, et al. "CN-CELEB: a challenging Chinese speaker recognition dataset." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. (https://ieeexplore.ieee.org/abstract/document/9054017)

Theses

Villalba, J. Advances on speaker recognition in non collaborative environments. Diss. Ph. D. dissertation, University of Zaragoza, 2014.
Brummer, Niko. Measuring, refining and calibrating speaker and language information extracted from speech. Diss. Stellenbosch: University of Stellenbosch, 2010. (http://scholar.sun.ac.za/handle/10019.1/5139)

Books

Mak, Man-Wai, and Jen-Tzung Chien. Machine learning for speaker recognition. Cambridge University Press, 2020. (http://www.eie.polyu.edu.hk/~mwmak/papers/spkver-book_toc.pdf)

Softwares

Hyperion, Villalba et al., 2019 (https://github.com/jsalt2019-diadet/hyperion/tree/14a11436d62f3c15cd9b1f70bcce3eafbea2f753)
SpeechBrain, Ravanelli et al., 2021 (https://github.com/speechbrain/speechbrain)
Angular Prototypical Loss, Chung et al. 2020 (https://arxiv.org/abs/2003.11982)
BOSARIS, multiple versions, (https://github.com/bsxfan/PYLLR, https://projets-lium.univ-lemans.fr/sidekit/api/bosaris/index.html, https://gitlab.eurecom.fr/nautsch/pybosaris)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-speaker-recognition

Review/survey papers

Pre-Deep learning

Speech features

Front-end

Back-end

Architectures

Pooling

(Towards?) End-to-end

Representation learning

With self-supervised learning

With speaker diarization

With speech enhancement

With domain adaptation

Joint learning

Multi-modal

Metrics

System Descriptions

Miscellaneous

Datasets

Theses

Books

Softwares

About

Releases

Packages

saurabh-kataria/awesome-speaker-recognition

Folders and files

Latest commit

History

Repository files navigation

awesome-speaker-recognition

Review/survey papers

Pre-Deep learning

Speech features

Front-end

Back-end

Architectures

Pooling

(Towards?) End-to-end

Representation learning

With self-supervised learning

With speaker diarization

With speech enhancement

With domain adaptation

Joint learning

Multi-modal

Metrics

System Descriptions

Miscellaneous

Datasets

Theses

Books

Softwares

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages