Skip to content

Latest commit

 

History

History
137 lines (124 loc) · 17.9 KB

audio-ai.md

File metadata and controls

137 lines (124 loc) · 17.9 KB

🏠Home

Audio

Compression

  • EnCodec SOTA deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio

Multiple Tasks

  • audio-webui A web-based UI for various audio-related Neural Networks with features like text-to-audio, voice cloning, and automatic-speech-recognition using Bark, AudioLDM, AudioCraft, RVC, coqui-ai and Whisper
  • tts-generation-webui for all things TTS, currently supports Bark v2, MusicGen, Tortoise, Vocos
  • Speechbrain A PyTorch-based Speech Toolkit for TTS, STT, etc
  • Nvidia NeMo TTS, LLM, Audio Synthesis framework
  • speech-rest-api for Speech-To-Text and Text-To-Speech with Whisper and Speechbrain
  • LangHelper language learning through Text-to-speech + chatGPT + speech-to-text to practise speaking assessments, memorizing words and listening tests
  • Silero-models pre-trained speech-to-text, text-to-speech and text-enhancement for ONNX, PyTorch, TensorFlow, SSML
  • AI-Waifu-Vtuber AI Waifu Vtuber & is a virtual streamer. Supports multiple languages and uses VoiceVox, DeepL, Whisper, Seliro TTS, and VtubeStudio, and now also supports Twitch streaming.
  • Voicebox large-scale text-guided generative speech model using non-autoregressive flow-matching, paper, demo, pytorch implementation, implementation
  • Auto-Synced-Translated-Dubs Automatic YouTube video speech to text, translation, text to speech in order to dub a whole video
  • SeamlessM4T Foundational Models for SOTA Speech and Text Translation

Speech Recognition

TextToSpeech

Voice Conversion

Video Voice Dubbing

  • weeablind dub multi lingual media using modern AI speech synthesis, diarization, and language identification
  • Auto-synced-translated-dubs Youtube audio translation and dubbing pipeline using Whisper speech-to-text, Google/DeepL Translate, Azure/Google TTS
  • videodubber dub video using GCP TTS, Translate, Whisper, Spacy tokenization and syllable counting
  • TranslatorYouTuber Takes a youtube video, clones the voice and re-creates that video in a different language
  • global-video-dubbing Using Googel Cloud Video Intelligence API with Cloud Translation API and Cloud Text to Speech API to generate voice dubbing and tranaslations in many languages automatically
  • wav2lip Lip Syncing from audio
  • Wav2Lip-GFPGAN High quality Lip sync with wav2lip + Tencent GFPGAN

Music Generation

  • audiocraft library for audio processing and generation with deep learning using EnCodec compressor / tokenizer and MusicGen support
    • audiocraft-infinity-webui webui supporting generation longer than 30 seconds, song continuation, seed option, load local models from chavinlo's training repo, MacOS/linux support, running on CPU/gpu
    • musicgen_trainer simple trainer for musicgen/audiocraft
    • audiocraft-webui basic webui with support for long audio, segmented audio and processing queue
    • audiocraft-webui another basic webui, unknown feature set
    • MusicGeneration a streamlit gui for audiocraft and musicgen
    • audiocraftgui with wxPython supporting continuous generation by using chunks and overlaps
    • MusicGen a simple and controllable model for music generation using a Transformer model examples, colab, colab collection
    • audiocraft-infinity-webui generation length over 30 seconds, ability to continue songs, seeds, allows to load local models
    • AudioCraft Plus an all-in-one WebUI for the original AudioCraft, adding multiband diffusion, continuation, custom model support, mono to stereo and more
  • AudioLDM Generate speech, sound effects, music and beyond, with text code, paper, HF demo

Audio Source Separation

Research

  • Vocos Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
  • WavJourney Compositional Audio Creation with LLMs github
  • PromptingWhisper Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation for Whisper