Voice Cloning API

A FastAPI-based REST API wrapper for the Fish-Speech voice cloning model. This API allows you to clone voices and generate new speech with custom text using the Fish-Speech model.

Credits

This project is a REST API wrapper built around the Fish-Speech model. All credit for the underlying voice cloning technology goes to:

Original Repository: Fish-Speech
License: Original License
Authors: Fish-Speech Team

Features

Upload reference audio file
Provide reference and target text
Generate cloned voice speaking the target text
CPU/CUDA support

Prerequisites

Python 3.8+
FastAPI
PyTorch
Fish-Speech model checkpoints

Installation

Clone the repository:

git clone [your-repo-url]
cd voice-cloning-app

Create and activate virtual environment:

python -m venv venv
# On Windows
.\venv\Scripts\activate
# On Linux/Mac
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Download the Fish-Speech model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5

Usage

Start the FastAPI server:

uvicorn app.main:app --reload

Access the API documentation at http://localhost:8000/docs

Example Usage

Upload Reference Audio and Generate New Speech Access the

Swagger UI at http://localhost:8000/docs:

Use the /clone-voice endpoint with:

- Audio File: Your reference voice recording
- Reference Text: The exact words spoken in your reference audio
- Target Text: The new text you want to generate

Example Request:

Audio File: your_voice.wav
Reference Text: "Welcome to the podcast. Let’s dive into today’s topic." (reference text must match exactly what's said in the input audio)
Target Text: "Today's episode will focus on AI and its impact on society."

Successful response will look like

{
  "status": "success",
  "message": "Voice cloning successful",
  "output_path": "fake.wav"
}

Project Structure

voice-cloning-app/
├── app/
│   ├── main.py              # FastAPI application
│   └── services/
│       ├── voice_clone.py   # Voice cloning service
│       └── audio_service.py # Audio handling service
├── tools/                   # Fish-Speech inference tools
├── checkpoints/            # Model checkpoints
└── uploads/               # Temporary upload directory

Important Notes

The reference text must match exactly what is being said in the input audio file
Input audio should be clear and around 10 seconds long for best results
The API currently saves the output as 'fake.wav' in the project directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Voice Cloning API

Credits

Features

Prerequisites

Installation

Usage

Example Usage

Use the /clone-voice endpoint with:

Example Request:

Project Structure

Important Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Voice Cloning API

Credits

Features

Prerequisites

Installation

Usage

Example Usage

Use the /clone-voice endpoint with:

Example Request:

Project Structure

Important Notes