Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):
Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe rvc.py
instead of the conventional python rvc.py
command.
chmod +x install.sh
./install.sh
Download the necessary models and executables by running the following command:
python rvc.py prerequisites
More information about the prerequisites command here
For detailed information and command-line options, refer to the help command:
python rvc.py -h
This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.
python rvc.py infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_path "input_path" --output_path "output_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
f0up_key |
No | 0 | -24 to +24 | Set the pitch of the audio, the higher the value, thehigher the pitch. |
filter_radius |
No | 3 | 0 to 10 | If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration. |
index_rate |
No | 0.3 | 0.0 to 1.0 | Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio. |
hop_length |
No | 128 | 1 to 512 | Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy. |
rms_mix_rate |
No | 1 | 0 to 1 | Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed. |
protect |
No | 0.33 | 0 to 0.5 | Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect. |
f0autotune |
No | False | True or False | Apply a soft autotune to your inferences, recommended for singing conversions. |
f0method |
No | rmvpe | pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] | Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases. |
input_path |
Yes | None | Full path to the input audio file | Full path to the input audio file |
output_path |
Yes | None | Full path to the output audio file | Full path to the output audio file |
pth_path |
Yes | None | Full path to the pth file | Full path to the pth file |
index_path |
Yes | None | Full index file path | Full index file path |
split_audio |
No | False | True or False | Split the audio into chunks for inference to obtain better results in some cases. |
clean_audio |
No | False | True or False | Clean your audio output using noise detection algorithms, recommended for speaking audios. |
clean_strength |
No | 0.7 | 0.0 to 1.0 | Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed. |
export_format |
No | WAV | WAV, MP3, FLAC, OGG, M4A | File audio format |
embedder_model |
No | hubert | hubert or contentvec | Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases. |
upscale_audio |
No | False | True or False | Upscale the audio to 48kHz for better results. |
Refer to python rvc.py infer -h
for additional help.
python rvc.py batch_infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_folder_path "input_folder_path" --output_folder_path "output_folder_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
f0up_key |
No | 0 | -24 to +24 | Set the pitch of the audio, the higher the value, thehigher the pitch. |
filter_radius |
No | 3 | 0 to 10 | If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration. |
index_rate |
No | 0.3 | 0.0 to 1.0 | Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio. |
hop_length |
No | 128 | 1 to 512 | Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy. |
rms_mix_rate |
No | 1 | 0 to 1 | Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed. |
protect |
No | 0.33 | 0 to 0.5 | Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect. |
f0autotune |
No | False | True or False | Apply a soft autotune to your inferences, recommended for singing conversions. |
f0method |
No | rmvpe | pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] | Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases. |
input_folder_path |
Yes | None | Full path to the input audio folder (The folder may only contain audio files) | Full path to the input audio folder |
output_folder_path |
Yes | None | Full path to the output audio folder | Full path to the output audio folder |
pth_path |
Yes | None | Full path to the pth file | Full path to the pth file |
index_path |
Yes | None | Full path to the index file | Full path to the index file |
split_audio |
No | False | True or False | Split the audio into chunks for inference to obtain better results in some cases. |
clean_audio |
No | False | True or False | Clean your audio output using noise detection algorithms, recommended for speaking audios. |
clean_strength |
No | 0.7 | 0.0 to 1.0 | Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed. |
export_format |
No | WAV | WAV, MP3, FLAC, OGG, M4A | File audio format |
embedder_model |
No | hubert | hubert or contentvec | Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases. |
upscale_audio |
No | False | True or False | Upscale the audio to 48kHz for better results. |
Refer to python rvc.py batch_infer -h
for additional help.
python rvc.py tts_infer --tts_text "tts_text" --tts_voice "tts_voice" --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --output_tts_path "output_tts_path" --output_rvc_path "output_rvc_path" --pth_path "pth_path" --index_path "index_path"--split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
tts_text |
Yes | None | Text for TTS synthesis | Text for TTS synthesis |
tts_voice |
Yes | None | Voice for TTS synthesis | Voice for TTS synthesis |
f0up_key |
No | 0 | -24 to +24 | Set the pitch of the audio, the higher the value, thehigher the pitch. |
filter_radius |
No | 3 | 0 to 10 | If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration. |
index_rate |
No | 0.3 | 0.0 to 1.0 | Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio. |
hop_length |
No | 128 | 1 to 512 | Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy. |
rms_mix_rate |
No | 1 | 0 to 1 | Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed. |
protect |
No | 0.33 | 0 to 0.5 | Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect. |
f0autotune |
No | False | True or False | Apply a soft autotune to your inferences, recommended for singing conversions. |
f0method |
No | rmvpe | pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] | Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases. |
output_tts_path |
Yes | None | Full path to the output TTS audio file | Full path to the output TTS audio file |
output_rvc_path |
Yes | None | Full path to the input RVC audio file | Full path to the input RVC audio file |
pth_path |
Yes | None | Full path to the pth file | Full path to the pth file |
index_path |
Yes | None | Full path to the index file | Full path to the index file |
split_audio |
No | False | True or False | Split the audio into chunks for inference to obtain better results in some cases. |
clean_audio |
No | False | True or False | Clean your audio output using noise detection algorithms, recommended for speaking audios. |
clean_strength |
No | 0.7 | 0.0 to 1.0 | Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed. |
export_format |
No | WAV | WAV, MP3, FLAC, OGG, M4A | File audio format |
embedder_model |
No | hubert | hubert or contentvec | Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases. |
upscale_audio |
No | False | True or False | Upscale the audio to 48kHz for better results. |
Refer to python rvc.py tts_infer -h
for additional help.
python rvc.py preprocess --model_name "model_name" --dataset_path "dataset_path" --sampling_rate "sampling_rate"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_name |
Yes | None | Name of the model | Name of the model |
dataset_path |
Yes | None | Full path to the dataset folder (The folder may only contain audio files) | Full path to the dataset folder |
sampling_rate |
Yes | None | 32000, 40000, or 48000 | Sampling rate of the audio data |
Refer to python rvc.py preprocess -h
for additional help.
python rvc.py extract --model_name "model_name" --rvc_version "rvc_version" --pitch_guidance "pitch_guidance" --hop_length "hop_length" --sampling_rate "sampling_rate"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_name |
Yes | None | Name of the model | Name of the model |
rvc_version |
No | v2 | v1 or v2 | Version of the model |
pitch_guidance |
No | True | True or False | By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential. |
hop_length |
No | 128 | 1 to 512 | Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy. |
sampling_rate |
Yes | None | 32000, 40000, or 48000 | Sampling rate of the audio data |
embedder_model |
No | hubert | hubert or contentvec | Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases. |
python rvc.py train --model_name "model_name" --rvc_version "rvc_version" --save_every_epoch "save_every_epoch" --save_only_latest "save_only_latest" --save_every_weights "save_every_weights" --total_epoch "total_epoch" --sampling_rate "sampling_rate" --batch_size "batch_size" --gpu "gpu" --pitch_guidance "pitch_guidance" --overtraining_detector "overtraining_detector" --overtraining_threshold "overtraining_threshold" --sync_graph "sync_graph" --pretrained "pretrained" --custom_pretrained "custom_pretrained" [--g_pretrained "g_pretrained"] [--d_pretrained "d_pretrained"]
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_name |
Yes | None | Name of the model | Name of the model |
rvc_version |
No | v2 | v1 or v2 | Version of the model |
save_every_epoch |
Yes | None | 1 to 50 | Determine at how many epochs the model will saved at. |
save_only_latest |
No | False | True or False | Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space. |
save_every_weights |
No | True | True or False | This setting enables you to save the weights of the model at the conclusion of each epoch. |
total_epoch |
No | 1000 | 1 to 10000 | Specifies the overall quantity of epochs for the model training process. |
sampling_rate |
Yes | None | 32000, 40000, or 48000 | Sampling rate of the audio data |
batch_size |
No | 8 | 1 to 50 | It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results. |
gpu |
No | 0 | 0 to ∞ separated by - | Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-). |
pitch_guidance |
No | True | True or False | By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential. |
overtraining_detector |
No | False | True or False | Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting. |
overtraining_threshold |
No | 50 | 1 to 100 | Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be. |
pretrained |
No | True | True or False | Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality. |
custom_pretrained |
No | False | True or False | Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance. |
g_pretrained |
No | None | Full path to pretrained file G, only if you have used custom_pretrained | Full path to pretrained file G |
d_pretrained |
No | None | Full path to pretrained file D, only if you have used custom_pretrained | Full path to pretrained file D |
sync_graph |
No | False | True or False | Synchronize the graph of the tensorbaord. Only enable this setting if you are training a new model. |
Refer to python rvc.py train -h
for additional help.
python rvc.py index --model_name "model_name" --rvc_version "rvc_version"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_name |
Yes | None | Name of the model | Name of the model |
rvc_version |
Yes | None | v1 or v2 | Version of the model |
Refer to python rvc.py index -h
for additional help.
python uvr.py [audio_file] [options]
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
audio_file |
Yes | None | Any valid audio file path | The path to the audio file you want to separate, in any common format. |
-d , --debug |
No | False | Enable debug logging. | |
-e , --env_info |
No | False | Print environment information and exit. | |
-l , --list_models |
No | False | List all supported models and exit. | |
--log_level |
No | info | info, debug, warning | Log level. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
-m , --model_filename |
No | UVR-MDX-NET-Inst_HQ_3.onnx | Any valid model file path | Model to use for separation. |
--output_format |
No | WAV | Any common audio format | Output format for separated files. |
--output_dir |
No | None | Any valid directory path | Directory to write output files. |
--model_file_dir |
No | /tmp/audio-separator-models/ | Any valid directory path | Model files directory. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
--invert_spect |
No | False | Invert secondary stem using spectrogram. | |
--normalization |
No | 0.9 | Any float value | Max peak amplitude to normalize input and output audio to. |
--single_stem |
No | None | Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other | Output only a single stem. |
--sample_rate |
No | 44100 | Any integer value | Modify the sample rate of the output audio. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
--mdxc_segment_size |
No | 256 | Any integer value | Size of segments for MDXC architecture. |
--mdxc_override_model_segment_size |
No | False | Opverride model default segment size instead of using the model default value. | |
--mdxc_overlap |
No | 8 | 2 to 50 | Amount of overlap between prediction windows for MDXC architecture. |
--mdxc_batch_size |
No | 1 | Any integer value | Batch size for MDXC architecture. |
--mdxc_pitch_shift |
No | 0 | Any integer value | Shift audio pitch by a number of semitones while processing for MDXC architecture. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
--mdx_segment_size |
No | 256 | Any integer value | Size of segments for MDX architecture. |
--mdx_overlap |
No | 0.25 | 0.001 to 0.999 | Amount of overlap between prediction windows for MDX architecture. |
--mdx_batch_size |
No | 1 | Any integer value | Batch size for MDX architecture. |
--mdx_hop_length |
No | 1024 | Any integer value | Hop length for MDX architecture. |
--mdx_enable_denoise |
No | False | Enable denoising during separation for MDX architecture. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
--demucs_segment_size |
No | Default | Any integer value | Size of segments for Demucs architecture. |
--demucs_shifts |
No | 2 | Any integer value | Number of predictions with random shifts for Demucs architecture. |
--demucs_overlap |
No | 0.25 | 0.001 to 0.999 | Overlap between prediction windows for Demucs architecture. |
--demucs_segments_enabled |
No | True | Enable segment-wise processing for Demucs architecture. |
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
--vr_batch_size |
No | 4 | Any integer value | Batch size for VR architecture. |
--vr_window_size |
No | 512 | Any integer value | Window size for VR architecture. |
--vr_aggression |
No | 5 | -100 to 100 | Intensity of primary stem extraction for VR architecture. |
--vr_enable_tta |
No | False | Enable Test-Time-Augmentation for VR architecture. | |
--vr_high_end_process |
No | False | Mirror the missing frequency range of the output for VR architecture. | |
--vr_enable_post_process |
No | False | Identify leftover artifacts within vocal output for VR architecture. | |
--vr_post_process_threshold |
No | 0.2 | 0.1 to 0.3 | Threshold for post-process feature for VR architecture. |
python rvc.py model_extract --pth_path "pth_path" --model_name "model_name" --sampling_rate "sampling_rate" --pitch_guidance "pitch_guidance" --rvc_version "rvc_version" --epoch "epoch" --step "step"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
pth_path |
Yes | None | Path to the pth file | Full path to the pth file |
model_name |
Yes | None | Name of the model | Name of the model |
sampling_rate |
Yes | None | 32000, 40000, or 48000 | Sampling rate of the audio data |
pitch_guidance |
Yes | None | True or False | By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential. |
rvc_version |
Yes | None | v1 or v2 | Version of the model |
epoch |
Yes | None | 1 to 10000 | Specifies the overall quantity of epochs for the model training process. |
step |
Yes | None | 1 to ∞ | Specifies the overall quantity of steps for the model training process. |
python rvc.py model_information --pth_path "pth_path"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
pth_path |
Yes | None | Path to the pth file | Full path to the pth file |
python rvc.py model_blender --model_name "model_name" --pth_path_1 "pth_path_1" --pth_path_2 "pth_path_2" --ratio "ratio"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_name |
Yes | None | Name of the model | Name of the model |
pth_path_1 |
Yes | None | Path to the first pth file | Full path to the first pth file |
pth_path_2 |
Yes | None | Path to the second pth file | Full path to the second pth file |
ratio |
No | 0.5 | 0.0 to 1 | Value for blender ratio |
python rvc.py tensorboard
Run the download script with the following command:
python rvc.py download --model_link "model_link"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
model_link |
Yes | None | Link of the model (enclosed in double quotes; Google Drive or Hugging Face) | Link of the model |
Refer to python rvc.py download -h
for additional help.
python rvc.py audio_analyzer --input_path "input_path"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
input_path |
Yes | None | Full path to the input audio file | Full path to the input audio file |
Refer to python rvc.py audio_analyzer -h
for additional help.
python rvc.py prerequisites --pretraineds_v1 "pretraineds_v1" --pretraineds_v2 "--pretraineds_v2" --models "models" --exe "exe"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
pretraineds_v1 |
No | True | True or False | Download pretrained models for v1 |
pretraineds_v2 |
No | True | True or False | Download pretrained models for v2 |
models |
No | True | True or False | Download models for v1 and v2 |
exe |
No | True | True or False | Download the necessary executable files for the CLI to function properly (FFmpeg and FFprobe) |
python rvc.py api --host "host" --port "port"
Parameter Name | Required | Default | Valid Options | Description |
---|---|---|---|---|
host |
No | 127.0.0.1 | Value for host IP | Value for host IP |
port |
No | 8000 | Value for port number | Value for port number |
To use the RVC CLI via the API, utilize the provided script. Make API requests to the following endpoints:
- Docs:
/docs
- Ping:
/ping
- Infer:
/infer
- Batch Infer:
/batch_infer
- TTS:
/tts
- Preprocess:
/preprocess
- Extract:
/extract
- Train:
/train
- Index:
/index
- Model Information:
/model_information
- Model Fusion:
/model_fusion
- Download:
/download
Make POST requests to these endpoints with the same required parameters as in CLI mode.
The RVC CLI builds upon the foundations of the following projects:
- ContentVec by auspicious3000
- HIFIGAN by jik876
- audio-slicer by openvpi
- python-audio-separator by karaokenerds
- RMVPE by Dream-High
- FCPE by CNChTu
- VITS by jaywalnut310
- So-Vits-SVC by svc-develop-team
- Harmonify by Eempostor
- Retrieval-based-Voice-Conversion-WebUI by RVC-Project
- Mangio-RVC-Fork by Mangio621
- anyf0 by SoulMelody
We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.