ttsdemo.html


<html>
  <head>
    <meta charset="UTF-8">
    <title>TTS</title>
  </head>
  
  <body>

    <h2>Audio samples for "On-the-fly data augmentation for Text-to-speech style transfer"</h1>
    <p>
      <b>Authors: Raymond Chung and Brian Mak</b><br>
    </p>

    <hr>

      <h3>Sample audio in the training dataset</h3>
      <p>No speakers uttered in more than one style. </p>
      <table border="1">
        <tr>
          <th>Michelle (neutral voice)</th><th>VOA's news (newscasting)</th><th>BC2017 (storytelling)</th><th>Hillary's speech (public speaking)</th>          
	    </tr>

        <tr>
            <td><audio controls style="width: 300px;"><source src="./tts_da/Becoming01.0002.wav"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./tts_da/161203.0002.0001.wav"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./tts_da/audiobook_fls_Aamids_Badv_Cyou_Dmp3_00006.0006.wav"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./tts_da/segment3_0033.0002.wav"></audio></td>
        </tr>
        
      </table>

      
      <h4>Michelle's voice imitated the style of other speakers (unseen text)</h4>
      <p>The trained model was effective in the style transfer. </p>
      <table border="1">
        <tr>
          <th style="width: 130px;"></th><th>newscasting</th><th>public speaking</th><th>storytelling </th>
	    </tr>

        <tr>
            <td>Ground truth uttered by the target speaker</td>        
            <td><audio controls style="width: 300px;"><source src="./gt/210126_02.0005.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./gt/segment3_0031.0001.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./gt/audiobook_fls_Awindi_Bxxx_Cxxx_Dmp3_00012.0003.mp3"></audio></td>
   
        </tr>     
        
        <tr>
            <td>Michelle's neutral voice</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/210126_02.0005_2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/p2_2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/23_2.mp3"></audio></td>
        </tr>        
        
        <tr>
            <td>Michelle's stylish voice</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/210126_02.0005.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/p2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/23.mp3"></audio></td>
        </tr>
       
      </table>      
    </p>      
    <hr>      
      
      <h4>More sythesized audio of unseen text on Michelle's voice with style selection</h4>
      <table border="1">
        <tr>
          <th style="width: 130px;">newscasting</th><th>news's unseen text</th><th>news's unseen text</th>
	    </tr>

        
        <tr>
            <td>neutral</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/8_393.0001_VOA2_2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/mo_news_5a.mp3"></audio></td>
        </tr>        
        
        <tr>
            <td>with style</td>        

            <td><audio controls style="width: 300px;"><source src="./stts/8_393.0001_VOA2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/mo_news_5b.mp3"></audio></td>
        </tr>
       
      </table>
      
      </p>
      <table border="1">
        <tr>
          <th style="width: 130px;">public speaking</th><th>speech's unseen text</th><th>speech's unseen text</th>
	    </tr>

        <tr>
            <td>neutral</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/p22_2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/4_17_hillspeech_2.mp3"></audio></td>
        </tr>        
        
        <tr>
            <td>with style</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/p22.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/4_17_hillspeech.mp3"></audio></td>
        </tr>
       
      </table>

      </p>
      <table border="1">
        <tr>
          <th style="width: 130px;">storytelling</th><th>story's unseen text</th><th>story's unseen text</th>
	    </tr>

        <tr>
            <td>neutral voice</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/1_2.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/longstoryline1_neutral.mp3"></audio></td>

        </tr>        
        
        <tr>
            <td>with style</td>        
            <td><audio controls style="width: 300px;"><source src="./stts/1.mp3"></audio></td>
            <td><audio controls style="width: 300px;"><source src="./stts/longstoryline1_story.mp3"></audio></td>

        </tr>
       
      </table>
      

</html>