You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):
Windows
Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe rvc.py instead of the conventional python rvc.py command.
Linux
chmod +x install.sh
./install.sh
Getting Started
Download the necessary models and executables by running the following command:
python rvc.py prerequisites
More information about the prerequisites command here
For detailed information and command-line options, refer to the help command:
python rvc.py -h
This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_path
Yes
None
Full path to the input audio file
Full path to the input audio file
output_path
Yes
None
Full path to the output audio file
Full path to the output audio file
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full index file path
Full index file path
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py infer -h for additional help.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_folder_path
Yes
None
Full path to the input audio folder (The folder may only contain audio files)
Full path to the input audio folder
output_folder_path
Yes
None
Full path to the output audio folder
Full path to the output audio folder
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full path to the index file
Full path to the index file
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py batch_infer -h for additional help.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
output_tts_path
Yes
None
Full path to the output TTS audio file
Full path to the output TTS audio file
output_rvc_path
Yes
None
Full path to the input RVC audio file
Full path to the input RVC audio file
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full path to the index file
Full path to the index file
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py tts_infer -h for additional help.
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
sampling_rate
Yes
None
32000, 40000, or 48000
Sampling rate of the audio data
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
Determine at how many epochs the model will saved at.
save_only_latest
No
False
True or False
Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
save_every_weights
No
True
True or False
This setting enables you to save the weights of the model at the conclusion of each epoch.
total_epoch
No
1000
1 to 10000
Specifies the overall quantity of epochs for the model training process.
sampling_rate
Yes
None
32000, 40000, or 48000
Sampling rate of the audio data
batch_size
No
8
1 to 50
It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
gpu
No
0
0 to ∞ separated by -
Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
pitch_guidance
No
True
True or False
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
overtraining_detector
No
False
True or False
Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
overtraining_threshold
No
50
1 to 100
Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
pretrained
No
True
True or False
Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
custom_pretrained
No
False
True or False
Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
g_pretrained
No
None
Full path to pretrained file G, only if you have used custom_pretrained
Full path to pretrained file G
d_pretrained
No
None
Full path to pretrained file D, only if you have used custom_pretrained
Full path to pretrained file D
sync_graph
No
False
True or False
Synchronize the graph of the tensorbaord. Only enable this setting if you are training a new model.
Refer to python rvc.py train -h for additional help.
Generate Index File
python rvc.py index --model_name "model_name" --rvc_version "rvc_version"
Parameter Name
Required
Default
Valid Options
Description
model_name
Yes
None
Name of the model
Name of the model
rvc_version
Yes
None
v1 or v2
Version of the model
Refer to python rvc.py index -h for additional help.
UVR
python uvr.py [audio_file] [options]
Info and Debugging
Parameter Name
Required
Default
Valid Options
Description
audio_file
Yes
None
Any valid audio file path
The path to the audio file you want to separate, in any common format.
-d, --debug
No
False
Enable debug logging.
-e, --env_info
No
False
Print environment information and exit.
-l, --list_models
No
False
List all supported models and exit.
--log_level
No
info
info, debug, warning
Log level.
Separation I/O Params
Parameter Name
Required
Default
Valid Options
Description
-m, --model_filename
No
UVR-MDX-NET-Inst_HQ_3.onnx
Any valid model file path
Model to use for separation.
--output_format
No
WAV
Any common audio format
Output format for separated files.
--output_dir
No
None
Any valid directory path
Directory to write output files.
--model_file_dir
No
/tmp/audio-separator-models/
Any valid directory path
Model files directory.
Common Separation Parameters
Parameter Name
Required
Default
Valid Options
Description
--invert_spect
No
False
Invert secondary stem using spectrogram.
--normalization
No
0.9
Any float value
Max peak amplitude to normalize input and output audio to.
--single_stem
No
None
Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other
Output only a single stem.
--sample_rate
No
44100
Any integer value
Modify the sample rate of the output audio.
MDXC Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--mdxc_segment_size
No
256
Any integer value
Size of segments for MDXC architecture.
--mdxc_override_model_segment_size
No
False
Opverride model default segment size instead of using the model default value.
--mdxc_overlap
No
8
2 to 50
Amount of overlap between prediction windows for MDXC architecture.
--mdxc_batch_size
No
1
Any integer value
Batch size for MDXC architecture.
--mdxc_pitch_shift
No
0
Any integer value
Shift audio pitch by a number of semitones while processing for MDXC architecture.
MDX Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--mdx_segment_size
No
256
Any integer value
Size of segments for MDX architecture.
--mdx_overlap
No
0.25
0.001 to 0.999
Amount of overlap between prediction windows for MDX architecture.
--mdx_batch_size
No
1
Any integer value
Batch size for MDX architecture.
--mdx_hop_length
No
1024
Any integer value
Hop length for MDX architecture.
--mdx_enable_denoise
No
False
Enable denoising during separation for MDX architecture.
Demucs Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--demucs_segment_size
No
Default
Any integer value
Size of segments for Demucs architecture.
--demucs_shifts
No
2
Any integer value
Number of predictions with random shifts for Demucs architecture.
--demucs_overlap
No
0.25
0.001 to 0.999
Overlap between prediction windows for Demucs architecture.
--demucs_segments_enabled
No
True
Enable segment-wise processing for Demucs architecture.
VR Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--vr_batch_size
No
4
Any integer value
Batch size for VR architecture.
--vr_window_size
No
512
Any integer value
Window size for VR architecture.
--vr_aggression
No
5
-100 to 100
Intensity of primary stem extraction for VR architecture.
--vr_enable_tta
No
False
Enable Test-Time-Augmentation for VR architecture.
--vr_high_end_process
No
False
Mirror the missing frequency range of the output for VR architecture.
--vr_enable_post_process
No
False
Identify leftover artifacts within vocal output for VR architecture.
--vr_post_process_threshold
No
0.2
0.1 to 0.3
Threshold for post-process feature for VR architecture.
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
rvc_version
Yes
None
v1 or v2
Version of the model
epoch
Yes
None
1 to 10000
Specifies the overall quantity of epochs for the model training process.
step
Yes
None
1 to ∞
Specifies the overall quantity of steps for the model training process.