Skip to content

leonardoboulitreau/whispervits-svc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Setup Environment

(Tested on a Quadro RTX 5000 with NVIDIA-SMI Driver Version: 535.104.05, CUDA Version: 12.2 on a UBUNTU 22.04)

  1. Build Dockerfile

     docker build -t whispervits-svc .
  2. Enter Docker container

  3. Download the Timbre Encoder: Speaker-Encoder by @mueller91, put best_model.pth.tar into speaker_pretrain/.

  4. Download whisper model whisper-large-v2. Make sure to download large-v2.pt,put it into whisper_pretrain/.

  5. Download hubert_soft model,put hubert-soft-0d54a1f4.pt into hubert_pretrain/.

  6. Download pitch extractor crepe full,put full.pth into crepe/assets.

    Note: crepe full.pth is 84.9 MB, not 6kb

  7. Download trained model lesd5_100.pretrain.pth, and put it into vits_pretrain/.

  8. Make sure you have downloaded the wav_spk_1 folder from the Benchmarking-SGDD repository. Then, run the script.

python convert-TWH-spk1.py /path/to/wav_spk_1

The output will be a folder containing all conversions used on the evaluation. The same that is found on this google drive.

About

Inference of whisper-vits-svc on the ESD+LJ dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages