I'd make it more simple :Đ
Regarding the issue of leaking of voice timbre, there are still no statistics on this
Compared to Diff-SVC and So-VITS-SVC, Diff-SVC performs much better when the training data is of extremely high quality, but this repository can performs better on lower quality data sets. Additionally, this repository is much faster in inference speed than Diff-SVC but slightly slower than So-VITS-SVC but improves the model's pronunciation a lot.
A singing voice conversion (SVC) model, using the Whisper PPG (Whisper Large V3) encoder to extract features from the input audio, sent into VITS along with the F0 to replace the original input to achieve a voice conversion effect with Mixed Decoder.
- Whisper Large V3:large-v3
- Place under
pretrain
.
- Place under
# For simple downloading.
# Whisper
wget -P pretrain/ https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
All that is required is that the data be put under the dataset_raw
folder in the structure format provided below.
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
│ ├───...
│ └───Lxx-0xx8.wav
└───speaker1
├───xx2-0xxx2.wav
├───...
└───xxx7-xxx007.wav
- Resample
python resample.py
- Preprocess
python preprocess.py
# Notice.
# The n_speakers value in the config will be set automatically according to the amount of speakers in the dataset.
# To reserve space for additionally added speakers in the dataset, the n_speakers value will be be set to twice the actual amount.
# If you want even more space for adding more data, you can edit the n_speakers value in the config after runing this step.
# This can not be changed after training starts.
After running the step above, the dataset
folder will contain all the pre-processed data, you can delete the dataset_raw
folder after that.
python train.py -c configs/config.json -m 32k
- Edit
model_path
to your newest checkpoint. - Place the input audio under the
raw
folder. - Change
clean_names
to the output file name. - Use
trans
to edit the pitch shifting amount (semitones). - Change
spk_list
to the speaker name.
Use onnx_export.py
- Create a new folder called
checkpoints
. - Create a project folder in
checkpoints
folder with the desired name for your project, let's usemyproject
as example. Folder structure looks like./checkpoints/myproject
. - Rename your model to
model.pth
, rename your config file toconfig.json
then move them intomyproject
folder. - Modify onnx_export.py where
path = "NyaruTaffy"
, changeNyaruTaffy
to your project name, here it will bepath = "myproject"
. - Run onnx_export.py
- Once it finished, a
model.onnx
will be generated inmyproject
folder, that's the model you just exported. - Notice: if you want to export a 48K model, please follow the instruction below or use
model_onnx_48k.py
directly.- Open model_onnx.py and change
hps={"sampling_rate": 32000...}
tohps={"sampling_rate": 48000}
in classSynthesizerTrn
. - Open nvSTFT and replace all
32000
with48000
- Open model_onnx.py and change
- All training function and transformation are removed, only if they are all removed you are actually using Onnx.
Use sovits_gradio.py to run Gradio WebUI
- Create a new folder called
checkpoints
. - Create a project folder in
checkpoints
folder with the desired name for your project, let's usemyproject
as example. Folder structure looks like./checkpoints/myproject
. - Rename your model to
model.pth
, rename your config file toconfig.json
then move them intomyproject
folder. - Run sovits_gradio.py