wavファイルの拡張子に大文字が含まれると学習できない #185

sabipipe · 2024-11-22T09:03:02Z

解決策

wavファイルの拡張子に大文字が含まれる（ J80.WAV など）と学習に失敗するようです。Windows95時代の非常に古いデータを使用したため、ファイル名が拡張子を含め大文字でした。ドキュメントに記載するか、コードの修正が必要そうです。

問題の説明

モデルの学習を開始できません。書き起こしと前処理までは動作しますが、学習を行おうとすると以下のエラーが出ます。

UnboundLocalError: local variable 'bert_ori' referenced before assignment

ソースコードから推察すると、こちらのWARNINGが原因のように見えます。

11-22 17:39:20 |WARNING | data_utils.py:174 | Bert load Failed
11-22 17:39:20 |WARNING | data_utils.py:175 | unpickling stack underflow

複数のバージョンの組み合わせで試しましたが、いずれも機能しませんでした（詳細は後述）
2024/1/8 時点のHEADではモデルの学習に成功していました。今回は最新のHEADまでコードを更新し、仮想環境を再構築しています。なお、それだけでは書き起こしに問題があったため、一度リポジトリ全体を削除してcloneしなおしました。
- ↑データセットに変更を加えていました
Ubuntu環境もありますが、こちらはこの問題 API サーバーが立ち上がらない (Ubuntu) #168 に当たっており、現在学習させる方法がありません

期待される動作

正常に学習が開始されること。

現在の動作

11-22 16:43:30 |  INFO  | subprocess.py:23 | Running: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config localhost
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 10086
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 1
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:119 | Loading environment variables
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
11-22 16:43:38 |  INFO  | default_style.py:54 | At least 2 subdirectories are required for generating style vectors with respect to them, found 0.
11-22 16:43:38 |  INFO  | default_style.py:57 | Generating only neutral style vector instead.
11-22 16:43:39 |  INFO  | default_style.py:28 | Saved mean style vector to model_assets\Someone-v0.3
11-22 16:43:39 |  INFO  | default_style.py:36 | Saved style config to model_assets\Someone-v0.3\config.json
11-22 16:43:39 |WARNING | __init__.py:247 | C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\style_bert_vits2\models\utils is not a git repository, therefore hash value comparison will be ignored.
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|█████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 41042.75it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 124
11-22 16:43:39 |  INFO  | data_utils.py:348 | Bucket info: [115, 2, 1]
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 6
11-22 16:43:39 |  INFO  | train_ms_jp_extra.py:274 | Using noise scaled MAS for VITS2
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.weight
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.bias
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: emb_g.weight
11-22 16:43:42 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\G_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\D_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\WD_0.safetensors'
11-22 16:43:43 |  INFO  | train_ms_jp_extra.py:492 | Loaded the pretrained models.
11-22 16:43:45 |  INFO  | train_ms_jp_extra.py:540 | Start training.
  0%|                                                                                        | 0/11800 [00:00<?, ?it/s]11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
  0%|                                                                                        | 0/11800 [00:10<?, ?it/s]
11-22 16:43:56 | ERROR  | subprocess.py:33 | Error: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[rank0]: Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 1130, in <module>
[rank0]:     run()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 557, in run
[rank0]:     train_and_evaluate(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 695, in train_and_evaluate
[rank0]:     for batch_idx, (
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
[rank0]:     data = self._next_data()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
[rank0]:     return self._process_data(data)
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
[rank0]:     data.reraise()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\_utils.py", line 705, in reraise
[rank0]:     raise exception
[rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 199, in __getitem__
[rank0]:     return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 97, in get_audio_text_speaker_pair
[rank0]:     bert, ja_bert, en_bert, phones, tone, language = self.get_text(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 183, in get_text
[rank0]:     ja_bert = bert_ori
[rank0]: UnboundLocalError: local variable 'bert_ori' referenced before assignment


11-22 16:43:56 | ERROR  | train.py:360 | Train failed.

再現ステップ

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
git clone https://github.com/litagin02/Style-Bert-VITS2.git
cd Style-Bert-VITS2
uv venv venv
venv\Scripts\activate
uv pip install "torch<2.4" "torchaudio<2.4" --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
python initialize.py
python app.py

GUIにて学習を開始

バージョン情報

以下の組み合わせを試しましたがいずれも同様でした。OSはWindows11です。

ソースコード	Python	PyTorch	備考
`065a7ff`	3.10.11	2.3.1+cu121
`065a7ff`	3.10.11	2.2.2+cu121
2.6.0	3.9.13	2.3.1+cu121	numpyがエラーになったため numpy==1.26.4 に置き換え

The text was updated successfully, but these errors were encountered:

sabipipe · 2024-11-22T12:22:19Z

データセットのファイルの拡張子が大文字であることが問題のようにみえるので記述を修正しました。

sabipipe changed the title ~~Bug: Bert load Failed~~ docs: wavファイル名に大文字が含まれると学習できない Nov 22, 2024

sabipipe changed the title ~~docs: wavファイル名に大文字が含まれると学習できない~~ wavファイル名に大文字が含まれると学習できない Nov 22, 2024

sabipipe changed the title ~~wavファイル名に大文字が含まれると学習できない~~ wavファイルの拡張子に大文字が含まれると学習できない Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wavファイルの拡張子に大文字が含まれると学習できない #185

wavファイルの拡張子に大文字が含まれると学習できない #185

sabipipe commented Nov 22, 2024 •

edited

Loading

sabipipe commented Nov 22, 2024

wavファイルの拡張子に大文字が含まれると学習できない #185

wavファイルの拡張子に大文字が含まれると学習できない #185

Comments

sabipipe commented Nov 22, 2024 • edited Loading

解決策

問題の説明

期待される動作

現在の動作

再現ステップ

バージョン情報

sabipipe commented Nov 22, 2024

sabipipe commented Nov 22, 2024 •

edited

Loading