-
Notifications
You must be signed in to change notification settings - Fork 13
/
todos.txt
81 lines (68 loc) · 2.92 KB
/
todos.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Experimentation:
Debugging checkpoints:
- v0 LJ model: no masking, no stop predictions (unmasked)
- v0.5 LJ model: masking, no stop predictions
- v1 LJ model: masking + stop predictions
- General
- Generate train / validation evaluation loss numbers
- Indic
* Rerun baseline training from scratch with tf.data pipeline, masked loss, stop prediction (v1)
* Directly fine-tune pretrained model from LJSpeech (hyperparam search)
* Unsupervised training on mixed language audio
* Transfer learning with unsupervised initialization
* Transfer with frozen params, lower LR
- Transfer unsupervised from indic to lj data
- Train SSRN on Indic dataset
- LJSpeech
* Retrain M4 model with masked loss, stop predictions
* modify attention with position encodings, w w/o guided attention
- multi-task learning, joint optimization (long shot)
General Implementation:
- For Transfer learning experiments
* Implement direct transfer learning
* Implement partial loading of ljspeech model
* Modify training graph for unsupervised training
- Modify data loader to use both languages
- Data pipeline (Refer: https://github.com/tensorflow/tensor2tensor/blob/master/docs/overview.md)
* Implement dataset preprocessing
* Implement tf.data.Dataset based pipeline
* Fix deprecated WARNING: switch to tf.train.MonitoredTrainingSession
! - Bucket data generator by sequence length
! - Implement training + validation during training
- Fix SSRN patch-wise data generator with tfrecords
- Other improvements/bugfixes
* Implement fix: loss masking for padded batches (text2mel,ssrn)
* Implement stop prediction
- implement dropout, layernorm / other training tricks
- Globally use only params.data_dir, remove wav, csv, other path specifications
- Documentation
- Document params file parameters
Core Implementations:
* TextEnc implementation:
* implement, test highway conv
* implement, test text_enc block
* test initialization of highway conv layers
* Decoder implementation:
* implement, test causal conv
* implement, test causal highway conv
* implement, test causal decoder F2d
* implement, test causal decoder d2F
* Attention implementation:
* implement basic attention mechanism
* add, test guided attention loss
* test training with positional encoding
* SSRN implementation:
* implement, test transposed convolution
* implement full SSRN module
* modify build graph
* modify data pipeline for faster patch-based training
* Misc implementation:
* predict_op, loss_op
* train_op, train/predict on batch
* training, checkpointing, evaluation framework
* train/dev data load
* Inference/Synthesis graph
* implement restoring of variables
* implement output feedback
! - Fix inference with stop prediction (crop mag generation)
- Implement constrained attention at inference