Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last PR before merge to main #57

Merged
merged 3 commits into from
Nov 20, 2024
Merged

Conversation

ROBERT-MCDOWELL
Copy link
Collaborator

this last PR makes me think it should be ok for the merge of your main branch.
we reached 1k stars, so it's time to bring them to the new version...
it will also bring certainly more contributors..

@DrewThomasson
Copy link
Owner

Yes yes agreed :D

I'll run a test, in Vietnamese and in English to make sure that it can successfully use the fairseq and XTTS model

And then double check the readme

And then merge 👌

@DrewThomasson DrewThomasson merged commit abb9810 into DrewThomasson:v2.0 Nov 20, 2024
@ROBERT-MCDOWELL
Copy link
Collaborator Author

ROBERT-MCDOWELL commented Nov 20, 2024

I will push a new PR today, there are still things to change. about fairseq model we must still integrate it.

@DrewThomasson
Copy link
Owner

Yes yes yes, I was also thinking about setting a preferred tts model order so each language defaults to the best sounding compatible model.

Right now order being:

Best:XTTS----> Next Best:fairseq

@ROBERT-MCDOWELL
Copy link
Collaborator Author

I have to recheck the language_mapping again, wrong ISO codes.

@ROBERT-MCDOWELL
Copy link
Collaborator Author

the facebook/mm-tts is about 1TB :-| . so not good to import it all locally ...

@DrewThomasson
Copy link
Owner

lol yeah well just have them download the fairseq models on the fly,

It's only around 135 mb for each anyway

@DrewThomasson
Copy link
Owner

I have to recheck the language_mapping again, wrong ISO codes.

Oh yeah,

I think the iso code values are fine in the language_mapping it's just only sending the xtts format language codes for every language, and needs to be sending the iso style codes for specifically the fairseq model inference

You probs already 10 steps ahead of me on this though XD

@DrewThomasson
Copy link
Owner

DrewThomasson commented Nov 20, 2024

oh yeah and probably good to have the faiseq models generate:

  • With voice cloning inference if voice is given

  • Without voice cloning inference if no voice is given (For faster inference if its using VITS for the voice cloning)

@DrewThomasson
Copy link
Owner

DrewThomasson commented Nov 20, 2024

I'll run some fairseq speed testing on my end with and without voice cloning to double check 👌

@ROBERT-MCDOWELL
Copy link
Collaborator Author

oh yeah and probably good to have the faiseq models generate:

  • With voice cloning inference if voice is given
  • Without voice cloning inference if no voice is given (For faster inference if its using VITS for the voice cloning)

we can do it with

tts --out_path output/path/speech.wav --model_name "//<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>

@DrewThomasson
Copy link
Owner

DrewThomasson commented Nov 20, 2024

Ah so voice conversion 👌

looks like that should work cause I was having trouble on my end with trying to pass it though in one command with a fairseq model and voice cloning in one command

Good to know you found a way then 👍

@DrewThomasson
Copy link
Owner

I'll see about finding a more automated way of generating a bunch more of the test ebook files

@DrewThomasson
Copy link
Owner

#58

?🙏

@DrewThomasson
Copy link
Owner

Might have some free time to try a go at implementing the fairseq model for the finaly 2.0 update in a bit, Unless your already working on it lol.
I just don't wanna get in the way with conflicting pushes if your already working on it 😅

@ROBERT-MCDOWELL
Copy link
Collaborator Author

ROBERT-MCDOWELL commented Nov 21, 2024

I'm already working on it since 2 days now. preparing the repo for the 1142 languages... should be ok for tonight.
btw I saw vits2 coming into coqui-tts on a PR there... will see later.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Nov 21, 2024

kk 👍

oooo vits2? 👀 👀 👀

oh also

  • In the meantime should I fix the other read me languages to get them aligned with the ebook2audiobook.sh stuff or are you already set with that?

@ROBERT-MCDOWELL
Copy link
Collaborator Author

yes, the readme should be rebased. like adding the new options (--ebook-dir for ex) and behavior (no need to add True to the option, like --headless etc..), the video screenshot too :)

@DrewThomasson
Copy link
Owner

👍👍 I'll get on that 🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants