sarulab-speech

All

24 repositories

audio-foundation-model-dataset
Public
Apache License 2.0
•0•38•0•0•Updated Jan 8, 2025Jan 8, 2025
UTMOSv2
Public
UTokyo-SaruLab MOS Prediction System
Python
•
MIT License
•12•129•0•0•Updated Dec 9, 2024Dec 9, 2024
ensemble_svs_with_interactions
Public
Python
•
Other
•0•1•0•0•Updated Dec 2, 2024Dec 2, 2024
yodas-transcription
Public
Modified transcriptions of YODAS dataset
0•4•0•0•Updated Oct 26, 2024Oct 26, 2024
SaSLaW
Public
Dialogue Speech Corpus with Audio-visual Egocentric Information, "So, what are you Speaking, Listening, and Watching?"
Python
•0•7•0•0•Updated Aug 13, 2024Aug 13, 2024
spatial_voice_conversion
Public
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
Python
•1•14•0•0•Updated Aug 8, 2024Aug 8, 2024
VMC2024-sarulab-data
Public
MIT License
•0•7•0•0•Updated Jun 17, 2024Jun 17, 2024
Coco-Nut
Public
Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus
0•21•0•0•Updated Jun 12, 2024Jun 12, 2024
UTMOS22
Public
UT-Sarulab MOS prediction system using SSL models
Python
•
MIT License
•14•202•1•0•Updated Apr 11, 2024Apr 11, 2024
Mid-Attribute-Speaker-Generation
Public
Python
•
MIT License
•1•4•0•0•Updated Mar 28, 2024Mar 28, 2024
visual-onoma-to-wave
Public
Visual onoma-to-wave official implementation
Python
•
MIT License
•0•5•0•0•Updated Mar 11, 2024Mar 11, 2024
ml-audiocaps
Public
Multi-lingual AudioCaps
Apache License 2.0
•0•8•0•0•Updated Nov 20, 2023Nov 20, 2023
jtubespeech
Public
Python
•
Apache License 2.0
•46•215•6•1•Updated Nov 13, 2023Nov 13, 2023
xvector_jtubespeech
Public
xvector model on jtubespeech
Python
•
MIT License
•4•43•0•0•Updated Nov 5, 2023Nov 5, 2023
demo_ChatGPT_EDSS
Public
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings (INTERSPEECH2023)
HTML
•
Apache License 2.0
•0•0•0•0•Updated May 24, 2023May 24, 2023
demo_CALLS_corpus
Public
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center (INTERSPEECH2023)
HTML
•
Apache License 2.0
•0•0•0•0•Updated May 24, 2023May 24, 2023
whisper-asr-finetune
Public
Python
•
MIT License
•8•32•5•0•Updated Dec 4, 2022Dec 4, 2022
pseudo_speech_decryption
Public
Python
•0•1•0•0•Updated Jun 16, 2022Jun 16, 2022
lightweight_spkr_anon
Public
Lightweight speaker anonymization [IEEE SLT2021]
Python
•
MIT License
•11•26•0•1•Updated Jun 6, 2022Jun 6, 2022
fairseq
Public
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Python
•
MIT License
•6.4k•0•0•0•Updated Apr 4, 2022Apr 4, 2022
jsut-label
Public
context labels and pronunciation data for JSUT corpus
Other
•9•68•0•0•Updated Sep 2, 2021Sep 2, 2021
tdmelodic_openjtalk
Public
tdmelodic for open-jtalk
1•22•0•0•Updated Aug 30, 2021Aug 30, 2021
bert-japanese
Public
BERT models for Japanese text.
Python
•
Apache License 2.0
•55•0•0•0•Updated May 1, 2021May 1, 2021
multi-speaker-dgp
Public
Official implementation of DGP-based multi-speaker speech synthesis with PyTorch
Python
•
MIT License
•2•24•0•0•Updated Mar 23, 2021Mar 23, 2021