-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLM Adversarial Training did not start when finetuning #227
Comments
Same issue |
You seem to be missing the configuration options for when the second training starts See lines 6 and 7 in the LibriTTS Config File
Should be able to kick-off second stage training by loading your current model checkpoint and setting epochs_1st to 0 |
Thanks, do I also need the |
This did not fix the issue unfortunately |
I've managed to get to:
Aa where things go sour. If you have a batch size of 2, then it will always be 1, meaning SLMADV never starts. You need to change |
@78Alpha |
They're going to be zero for a while unless the conditions it's looking for are met. On about 1 epoch of training, my tensorboard only showed 60 steps worth of SLM training when i set batch percentage to 1. I don't know what exactly is looking for. |
@78Alpha |
Yeah, that's what it should look like. All graphs filled is the sign of all parts working. |
I tinkered around with the config_ft.yml file. I set Max_Len to 120. I set batch_percentage to 1. I set slmadv_params min_len to 100 and slmadv_params max_len to 120. Batch size is set to 2. Now the DiscLM and GenLM Loss stats are no longer at 0. I'm using a rx 7900 xtx. Note I'm training a model with style diffusion in one fine-tuning session and adversarial training in another session. Here a pic from my tensorboard folder. Edit: I discovered I can do style diffusion and SLM adversarial training together in one session. I set max_len to 252, epoch set to 100, batch_size set to 2, batch_percentage set to 1, slmadv_params min_len set to 180, slmadv_params max_len set to 190, diff_epoch to 10, joint_epoch to 50, I'm using the vokan model as the base model. I also rented out a h100 from runpod and slmadv_params and slmadv_params max_len are at default settings ( min_len: 400 and max_len: 500), batch size at 2 and batch_percentage to 1 and SLM adversarial training never started. I tinkered around with the config_ft.yml file again. I set Max_Len to 252. I set batch_percentage to 1. I set slmadv_params min_len to 100 and slmadv_params max_len to 500. Batch size is set to 2. Now the DiscLM and GenLM Loss stats are only occasionally at 0. This is done in one session. Second edit: Edit: Turns this is bad. I had set slmadv_params min_len even higher to get better quality. Here's a screenshot of the vram usage. This is what I did in runpod.
I install these.
I use the pwd command to find directory / filepath infomation.
I put the training dataset in a zip file and then I upload it to either https://catbox.moe/ or https://litterbox.catbox.moe/ (Which lets you upload a 1GB file) I download the vokan base model and zip file. I download the Vokan model and then upload to gofile.io and then download it with gofile-downloader
I unzip the file
I download the gofile upload script file.
I give the script permissions
I upload the pth file to https://gofile.io/
Third edit: Max_Len is set to 252. slmadv_params min_len is set to 180. slmadv_params max_len is set to 190. Max_Len is set to 252. slmadv_params min_len is set to 252. slmadv_params max_len is set to 252. Max_Len is set to 260. slmadv_params min_len is set to 260. slmadv_params max_len is set to 260. Max_Len is set to 280. slmadv_params min_len is set to 280. slmadv_params max_len is set to 280. |
Which one is best source to train Stts2 model? |
@PriyamJha0124 |
I tried to do finetuning on a small dataset with 2 speakers. I set
epochs=25
,diff_epoch=8
,joint_epoch=15
.The Style Diffusion training started as expected, but SLM Adversarial Training never started throughout the entire finetuning process.
My config is
What have I missed? Thanks!
The text was updated successfully, but these errors were encountered: