split and process valid set #25

lmxue · 2023-12-12T13:03:43Z

The PR is to support the validation set split and process to standardize the dataset.

preprocessors/libritts.py
preprocessors/ljspeech.py
preprocessors/metadata.py

RMSnow · 2023-12-12T15:38:31Z

preprocessors/metadata.py

What will happen when processing other datasets except for libritts and ljspeech? I suspect that line39 will cause a bug, since there is none valid.json for others.

Besides, is there any corresponding design for the valid and test dataset in trainer?

Rewrite metadata.py to automatically adapt different json files.
The valid set is split to distinguish it from the test set and also used to compute validation loss in the trainer.

RMSnow · 2023-12-14T09:37:24Z

preprocessors/metadata.py

@@ -8,7 +8,7 @@
 from tqdm import tqdm


-def cal_metadata(cfg):
+def cal_metadata(cfg, dataset_types):


For SVC, TTA, and Vocoder, the existing calls to this funcion is cal_metadata(cfg). So the implementation here will cause a bug. Maybe you can set default values to dataset_types.

split and process valid set

e2b86e2

lmxue requested review from RMSnow, zhizhengwu, HeCheng0625 and VocodexElysium December 12, 2023 13:04

RMSnow requested changes Dec 12, 2023

View reviewed changes

lmxue added 2 commits December 13, 2023 03:10

make metadata.py more flexible to different json files

2af9b4d

rename variable name to keep consistent

73c5290

lmxue requested a review from RMSnow December 14, 2023 11:53

RMSnow requested changes Dec 14, 2023

View reviewed changes

lmxue and others added 3 commits December 14, 2023 23:11

add default dataset_types

a8a4e98

black format

2293d5b

Merge branch 'main' into split_and_process_valid_set

00a9ad3

lmxue requested a review from RMSnow December 14, 2023 15:26

black format

ac6e936

RMSnow approved these changes Dec 14, 2023

View reviewed changes

RMSnow merged commit 5ceed25 into open-mmlab:main Dec 14, 2023
1 check passed

This was referenced Dec 14, 2023

An issue with the preprocessing part of LibriTTS. #31

Closed

Issue Running FastSpeech2 Model - FileNotFoundError: 'data/LJSpeech/valid.json' #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split and process valid set #25

split and process valid set #25

lmxue commented Dec 12, 2023 •

edited

Loading

RMSnow Dec 12, 2023

lmxue Dec 12, 2023 •

edited

Loading

RMSnow Dec 14, 2023

split and process valid set #25

split and process valid set #25

Conversation

lmxue commented Dec 12, 2023 • edited Loading

RMSnow Dec 12, 2023

Choose a reason for hiding this comment

lmxue Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

RMSnow Dec 14, 2023

Choose a reason for hiding this comment

lmxue commented Dec 12, 2023 •

edited

Loading

lmxue Dec 12, 2023 •

edited

Loading