Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wavファイルの拡張子に大文字が含まれると学習できない #185

Open
sabipipe opened this issue Nov 22, 2024 · 1 comment

Comments

@sabipipe
Copy link

sabipipe commented Nov 22, 2024

解決策

wavファイルの拡張子に大文字が含まれる( J80.WAV など)と学習に失敗するようです。Windows95時代の非常に古いデータを使用したため、ファイル名が拡張子を含め大文字でした。ドキュメントに記載するか、コードの修正が必要そうです。

問題の説明

モデルの学習を開始できません。書き起こしと前処理までは動作しますが、学習を行おうとすると以下のエラーが出ます。

UnboundLocalError: local variable 'bert_ori' referenced before assignment

ソースコードから推察すると、こちらのWARNINGが原因のように見えます。

11-22 17:39:20 |WARNING | data_utils.py:174 | Bert load Failed
11-22 17:39:20 |WARNING | data_utils.py:175 | unpickling stack underflow
  • 複数のバージョンの組み合わせで試しましたが、いずれも機能しませんでした(詳細は後述)
  • 2024/1/8 時点のHEADではモデルの学習に成功していました。今回は最新のHEADまでコードを更新し、仮想環境を再構築しています。なお、それだけでは書き起こしに問題があったため、一度リポジトリ全体を削除してcloneしなおしました。
    • ↑データセットに変更を加えていました
  • Ubuntu環境もありますが、こちらはこの問題 API サーバーが立ち上がらない (Ubuntu) #168 に当たっており、現在学習させる方法がありません

期待される動作

正常に学習が開始されること。

現在の動作

11-22 16:43:30 |  INFO  | subprocess.py:23 | Running: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config localhost
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 10086
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 1
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:119 | Loading environment variables
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
11-22 16:43:38 |  INFO  | default_style.py:54 | At least 2 subdirectories are required for generating style vectors with respect to them, found 0.
11-22 16:43:38 |  INFO  | default_style.py:57 | Generating only neutral style vector instead.
11-22 16:43:39 |  INFO  | default_style.py:28 | Saved mean style vector to model_assets\Someone-v0.3
11-22 16:43:39 |  INFO  | default_style.py:36 | Saved style config to model_assets\Someone-v0.3\config.json
11-22 16:43:39 |WARNING | __init__.py:247 | C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\style_bert_vits2\models\utils is not a git repository, therefore hash value comparison will be ignored.
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|█████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 41042.75it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 124
11-22 16:43:39 |  INFO  | data_utils.py:348 | Bucket info: [115, 2, 1]
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 6
11-22 16:43:39 |  INFO  | train_ms_jp_extra.py:274 | Using noise scaled MAS for VITS2
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.weight
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.bias
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: emb_g.weight
11-22 16:43:42 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\G_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\D_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\WD_0.safetensors'
11-22 16:43:43 |  INFO  | train_ms_jp_extra.py:492 | Loaded the pretrained models.
11-22 16:43:45 |  INFO  | train_ms_jp_extra.py:540 | Start training.
  0%|                                                                                        | 0/11800 [00:00<?, ?it/s]11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
  0%|                                                                                        | 0/11800 [00:10<?, ?it/s]
11-22 16:43:56 | ERROR  | subprocess.py:33 | Error: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[rank0]: Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 1130, in <module>
[rank0]:     run()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 557, in run
[rank0]:     train_and_evaluate(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 695, in train_and_evaluate
[rank0]:     for batch_idx, (
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
[rank0]:     data = self._next_data()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
[rank0]:     return self._process_data(data)
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
[rank0]:     data.reraise()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\_utils.py", line 705, in reraise
[rank0]:     raise exception
[rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 199, in __getitem__
[rank0]:     return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 97, in get_audio_text_speaker_pair
[rank0]:     bert, ja_bert, en_bert, phones, tone, language = self.get_text(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 183, in get_text
[rank0]:     ja_bert = bert_ori
[rank0]: UnboundLocalError: local variable 'bert_ori' referenced before assignment


11-22 16:43:56 | ERROR  | train.py:360 | Train failed.

再現ステップ

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
git clone https://github.com/litagin02/Style-Bert-VITS2.git
cd Style-Bert-VITS2
uv venv venv
venv\Scripts\activate
uv pip install "torch<2.4" "torchaudio<2.4" --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
python initialize.py
python app.py

GUIにて学習を開始

バージョン情報

以下の組み合わせを試しましたがいずれも同様でした。OSはWindows11です。

ソースコード Python PyTorch 備考
065a7ff 3.10.11 2.3.1+cu121
065a7ff 3.10.11 2.2.2+cu121
2.6.0 3.9.13 2.3.1+cu121 numpyがエラーになったため numpy==1.26.4 に置き換え
@sabipipe sabipipe changed the title Bug: Bert load Failed docs: wavファイル名に大文字が含まれると学習できない Nov 22, 2024
@sabipipe sabipipe changed the title docs: wavファイル名に大文字が含まれると学習できない wavファイル名に大文字が含まれると学習できない Nov 22, 2024
@sabipipe sabipipe changed the title wavファイル名に大文字が含まれると学習できない wavファイルの拡張子に大文字が含まれると学習できない Nov 22, 2024
@sabipipe
Copy link
Author

データセットのファイルの拡張子が大文字であることが問題のようにみえるので記述を修正しました。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant