Pitch Controllable DDSP Vocoders

The repository is a collection of relatively high fidelity, fast, easy trained pitch controllable ddsp vocoders modified from the below repositorys:

https://github.com/magenta/ddsp

https://github.com/YatingMusic/ddsp-singing-vocoders

1. Installing the dependencies

We recommend first installing PyTorch from the official website, then run:

pip install -r requirements.txt

2. Preprocessing

Put all the training dataset (.wav format audio clips) in the below directory: data/train/audio. Put all the validation dataset (.wav format audio clips) in the below directory: data/val/audio. Then run

python preprocess.py -c configs/full.yaml

for a model of hybrid additive synthesis and subtractive synthesis, or run

python preprocess.py -c configs/sins.yaml

for a model of additive synthesis only, or run

python preprocess.py -c configs/sawsub.yaml

for a model of substractive synthesis only.

You can modify the configuration file config/<model_name>.yaml before preprocessing. The default configuration during training is 44.1khz sampling rate audio for about a few hours and GTX1660 graphics card.

3. Training

# train a full model as an example
python train.py -c configs/full.yaml

The command line for training other models is similar.

You can safely interrupt training, then running the same command line will resume training.

You can also finetune the model if you interrupt training first, then re-preprocess the new dataset or change the training parameters (batchsize, lr etc.) and then run the same command line.

4. Visualization

# check the training status using tensorboard
tensorboard --logdir=exp

5. Copy-synthesising or pitch-shifting test

# Copy-synthesising test
# wav -> mel, f0 -> wav
python main.py -i <input.wav> -m <model_file.pt> -o <output.wav> -k <keychange (semitones)>

# Pitch-shifting test
# wav -> mel, f0 -> mel (unchaned), f0 (shifted) -> wav
python main.py -i <input.wav> -m <model_file.pt> -o <output.wav> -k <key(semitones)>

6. Some suggestions for the model choice

It is recommended to try the "Full" model first, which generally has a low multi-scaled-stft loss and relatively good quality when applying a pitch shift.

However, this loss sometimes cannot reflect the subjective sense of hearing.

If the "Full" model does not work well, it is recommended to switch to the "Sins" model.

The "Sins" model works also well when applying copy synthesis, but it changes the formant when applying a pitch shift, which changes the timbre.

The "SawSub" model is not recommended due to artifacts in unvoiced phonemes, although it probably has the best formant invariance in pitch-shifting cases.

7. Comments on the sound quality

For the seen speaker, the sound quality of a well-trained ddsp vocoder will be better than that of the world vocoder or griffin-lim vocoder, and it can also compete with the gan-based vocoder when the total amount of data is relatively small. But for a large amount of data, the upper limit of sound quality will be lower than that of generative model-based vocoders.

For the unseen speaker, the performance may be unsatisfactory.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
ddsp		ddsp
exp		exp
logger		logger
LICENSE		LICENSE
README.md		README.md
data_cnpop.py		data_cnpop.py
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
solver.py		solver.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pitch Controllable DDSP Vocoders

1. Installing the dependencies

2. Preprocessing

3. Training

4. Visualization

5. Copy-synthesising or pitch-shifting test

6. Some suggestions for the model choice

7. Comments on the sound quality

About

Releases

Packages

Languages

License

splinter21/pc-ddsp

Folders and files

Latest commit

History

Repository files navigation

Pitch Controllable DDSP Vocoders

1. Installing the dependencies

2. Preprocessing

3. Training

4. Visualization

5. Copy-synthesising or pitch-shifting test

6. Some suggestions for the model choice

7. Comments on the sound quality

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages