Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All voices have a "female pitch" #449

Closed
DaEmpty opened this issue Dec 7, 2024 · 2 comments
Closed

All voices have a "female pitch" #449

DaEmpty opened this issue Dec 7, 2024 · 2 comments

Comments

@DaEmpty
Copy link

DaEmpty commented Dec 7, 2024

Describe the bug
I started using Alltalk beta as standalone last month and work with a commit from 29th November 2024 at 04:41.
Everything was fine.

Today i tried to package my projects and tested the current version (luckily i a new folder).
But all voices do now sound like a female .
I have gone back to the old version, but the problem still exists with new "installations" of the old project (checking out, executing setup, starting).

I tried copying over the "confignew.json" but i only changed the delete_output_wavs anyways.

Testing was done with a simple request from postman (identical for both instances.
Request

It feels like a configuration error, but i'm not sure how to track down the problem as i'm not able to get a proper output with a fresh setup.

Text/logs
Here the starting output with the same version from november. (but problem occurs with the current version also)
old:

[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated : 29th November 2024 at 04:41 Branch: alltalkbeta
[AllTalk ENG] Transcoding : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version : 3.11.10
[AllTalk ENG] PyTorch Version : 2.2.2+cu121
[AllTalk ENG] CUDA Version : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 10.25 seconds.

new:
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated : 29th November 2024 at 04:41 Branch: alltalkbeta
[AllTalk ENG] Transcoding : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version : 3.11.11
[AllTalk ENG] PyTorch Version : 2.2.1
[AllTalk ENG] CUDA Version : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 9.92 seconds.

Desktop (please complete the following information):
AllTalk was updated: 29th November 2024 at 04:41
Also tested with the current version.
Custom Python environment: no
Text-generation-webUI was updated: standalone

Additional context
Windows Environment

@erew123
Copy link
Owner

erew123 commented Dec 7, 2024

Hi @DaEmpty I am away travelling, see here for details #377

As such I am currently unable to test, but replying with a couple of things you can ask/check.

  1. are you using the same XTTS model between the two versions? XTTS 2.0.2 and XTTS 2.0.3 do sound somewhat different. Check the model version in the models folder name.
  2. Are you using XTTS or API generation method on both version? Can confirm this on the "load different model" dropdown https://github.com/erew123/alltalk_tts/wiki/AllTalk-V2-QuickStart-Guide#4-generate-tts-tab
  3. The people whom actually maintain the Coqui TTS engine https://github.com/idiap/coqui-ai-TTS/releases did a version update a few days ago. You can generate a diagnostics file for both versions of your installations with start_diagnostics which will create a diagnostics log file for both versions. You can downgrade the coqui tts engine verison with start_environment and then pip install --force-reinstall coqui-tts==0.24.2 and see if that makes a difference.
  4. The only other possible difference I can think of it that recent version will only precompile the latent the 1x rather than every generation. This shouldnt cause a female sound, but you can delete the latent file that matches your wav file in the voices folder and it will re-calculate the generation on the next TTS generation for that voice.

Those are my only inital thoughts. AllTalk is effectively handing the text over to the Coqui TTS engine and assuming you havnt enabled RVC voices or pitch adjustment, should just generate absolutely normally.

As mentioned, I am travelling. If you want to possibly look at the above as possibilities, but still feel there is an issue, would you please upload a diagnostics log for both your old and new build of alltalk. Also if you wish to upload what you consider bad generations, you can upload a couple of samples here https://easyupload.io/ for me to listen to.

Thanks

@erew123
Copy link
Owner

erew123 commented Dec 8, 2024

Found the issue to be a problem with the latest Coqui TTS engine (not something Ive done thankully).

Downgrade by running:

  • start_environment.bat (or .sh if on Linux)
  • pip install --force-reinstall coqui-tts==0.24.3

All should be working fine again after that. Have set this in the reqirements files for new installations, so future installations shouldnt be an issue.

Thanks

@erew123 erew123 closed this as completed Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants