Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Change test for openAI compatible endpoints #440

Open
luckycold opened this issue Feb 9, 2025 · 8 comments
Open

Feature Request: Change test for openAI compatible endpoints #440

luckycold opened this issue Feb 9, 2025 · 8 comments

Comments

@luckycold
Copy link

Some models like https://github.com/remsky/Kokoro-FastAPI don't support the models endpoint. They do however, respond with the voices endpoint. If the test could be done directly with that and the api could allow for manually specifiying the model, this would allow for using this extremely high quality new TTS model.

@ken107
Copy link
Owner

ken107 commented Feb 10, 2025

The voices endpoint appears to be non-standard. I don't see it in OpenAI api documentation. If we switch to that, I'm afraid will break function for existing users.

But we can allow user to specify the URL to the voices endpoint, which could even be a static file. This endpoint is used for validation, as well as to retrieve the voice list and associated model. Output needs to be standardized like:

[{
voice: "af_bella",
model: "kokoro"
}, ...]

@luckycold
Copy link
Author

I see. That makes sense. So for this to work though, you'd have to manually specify the model also. So you might need to add a new entry for this specific use case to make this work. I made another issue on the other TTS project, but I'm not sure if that'll end up getting supported. Maybe to make it so that it doesn't break anything else, just adding the extra entry for a manual model could work as an optional parameter? The downside to this though would be it would end up breaking the check since you currently use the model endpoint to test for the connection to the api. So on top of that, maybe even making another parameter that allows you to just ignore the connection check would allow for this to work too? This model otherwise uses all of the same open AI endpoints for the actual audio streaming.

@luckycold
Copy link
Author

@ken107 The developer of the kokoro-fastapi actually just added the models endpoint! However, I noticed a new issue that the voices list in read-aloud doesn't seem to recognize the proper list. Is the voice list hard coded for the openai endpoint? That's what it seems like at the moment because I have some of the voices work and just outright break for others. But I do have something working now.

Right now the endpoint outputs this:

{
  "object": "list",
  "data": [
    {
      "id": "tts-1",
      "object": "model",
      "created": 1686935002,
      "owned_by": "kokoro"
    },
    {
      "id": "tts-1-hd",
      "object": "model",
      "created": 1686935002,
      "owned_by": "kokoro"
    },
    {
      "id": "kokoro",
      "object": "model",
      "created": 1686935002,
      "owned_by": "kokoro"
    }
  ]
}

The kokoro model is especially important for this, however, I'm not sure how read-aloud polls for voices. Does it use the models list for it? If so, the developer seems receptive to modify the output further.

@ken107
Copy link
Owner

ken107 commented Feb 12, 2025

Since there's no standard API for listing voices, a simple solution is to let user edit the list of voices. Currently it's hardcoded to the OpenAI voice list. If user is savvy enough to run Kokoro locally, they should be able to modify a simple JSON document, I presume.

@luckycold
Copy link
Author

luckycold commented Feb 12, 2025

Since there's no standard API for listing voices, a simple solution is to let user edit the list of voices. Currently it's hardcoded to the OpenAI voice list. If user is savvy enough to run Kokoro locally, they should be able to modify a simple JSON document, I presume.

Yeah, I think that's fair enough. If implemented that way, where would that JSON document be accessible though?

@ken107
Copy link
Owner

ken107 commented Feb 13, 2025

Could you test this version see if it works for you? You should be able to set the API url (to localhost:8880 for kokoro-fastapi), leave the apikey blank, and edit the voicelist to match kokoro's voice list.
https://github.com/ken107/read-aloud/tree/kokoro

@luckycold
Copy link
Author

Tested on chrome and it works great! As for Firefox, I couldn't get it to build for some reason. Probably just me not knowing how to do it right. But the foundation is totally there, It works great!

2025-02-13.08-09-19.mp4

@ken107
Copy link
Owner

ken107 commented Feb 13, 2025

Great! Firefox is a different branch with separate code. Once this is checked in, I'll merge it to the FF branch. Thanks for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants