-
Notifications
You must be signed in to change notification settings - Fork 0
/
ttsdemo.html
137 lines (100 loc) · 5.06 KB
/
ttsdemo.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<html>
<head>
<meta charset="UTF-8">
<title>TTS</title>
</head>
<body>
<h2>Audio samples for "On-the-fly data augmentation for Text-to-speech style transfer"</h1>
<p>
<b>Authors: Raymond Chung and Brian Mak</b><br>
</p>
<hr>
<h3>Sample audio in the training dataset</h3>
<p>No speakers uttered in more than one style. </p>
<table border="1">
<tr>
<th>Michelle (neutral voice)</th><th>VOA's news (newscasting)</th><th>BC2017 (storytelling)</th><th>Hillary's speech (public speaking)</th>
</tr>
<tr>
<td><audio controls style="width: 300px;"><source src="./tts_da/Becoming01.0002.wav"></audio></td>
<td><audio controls style="width: 300px;"><source src="./tts_da/161203.0002.0001.wav"></audio></td>
<td><audio controls style="width: 300px;"><source src="./tts_da/audiobook_fls_Aamids_Badv_Cyou_Dmp3_00006.0006.wav"></audio></td>
<td><audio controls style="width: 300px;"><source src="./tts_da/segment3_0033.0002.wav"></audio></td>
</tr>
</table>
<h4>Michelle's voice imitated the style of other speakers (unseen text)</h4>
<p>The trained model was effective in the style transfer. </p>
<table border="1">
<tr>
<th style="width: 130px;"></th><th>newscasting</th><th>public speaking</th><th>storytelling </th>
</tr>
<tr>
<td>Ground truth uttered by the target speaker</td>
<td><audio controls style="width: 300px;"><source src="./gt/210126_02.0005.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./gt/segment3_0031.0001.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./gt/audiobook_fls_Awindi_Bxxx_Cxxx_Dmp3_00012.0003.mp3"></audio></td>
</tr>
<tr>
<td>Michelle's neutral voice</td>
<td><audio controls style="width: 300px;"><source src="./stts/210126_02.0005_2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/p2_2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/23_2.mp3"></audio></td>
</tr>
<tr>
<td>Michelle's stylish voice</td>
<td><audio controls style="width: 300px;"><source src="./stts/210126_02.0005.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/p2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/23.mp3"></audio></td>
</tr>
</table>
</p>
<hr>
<h4>More sythesized audio of unseen text on Michelle's voice with style selection</h4>
<table border="1">
<tr>
<th style="width: 130px;">newscasting</th><th>news's unseen text</th><th>news's unseen text</th>
</tr>
<tr>
<td>neutral</td>
<td><audio controls style="width: 300px;"><source src="./stts/8_393.0001_VOA2_2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/mo_news_5a.mp3"></audio></td>
</tr>
<tr>
<td>with style</td>
<td><audio controls style="width: 300px;"><source src="./stts/8_393.0001_VOA2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/mo_news_5b.mp3"></audio></td>
</tr>
</table>
</p>
<table border="1">
<tr>
<th style="width: 130px;">public speaking</th><th>speech's unseen text</th><th>speech's unseen text</th>
</tr>
<tr>
<td>neutral</td>
<td><audio controls style="width: 300px;"><source src="./stts/p22_2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/4_17_hillspeech_2.mp3"></audio></td>
</tr>
<tr>
<td>with style</td>
<td><audio controls style="width: 300px;"><source src="./stts/p22.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/4_17_hillspeech.mp3"></audio></td>
</tr>
</table>
</p>
<table border="1">
<tr>
<th style="width: 130px;">storytelling</th><th>story's unseen text</th><th>story's unseen text</th>
</tr>
<tr>
<td>neutral voice</td>
<td><audio controls style="width: 300px;"><source src="./stts/1_2.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/longstoryline1_neutral.mp3"></audio></td>
</tr>
<tr>
<td>with style</td>
<td><audio controls style="width: 300px;"><source src="./stts/1.mp3"></audio></td>
<td><audio controls style="width: 300px;"><source src="./stts/longstoryline1_story.mp3"></audio></td>
</tr>
</table>
</html>