Custom Dataset support + Gentle-based custom dataset preprocessing support #78

engiecat · 2018-04-28T02:16:40Z

Hello.

I have developed several new functionalities on the repo.

1. Custom dataset support.

Till now, dataset preprocessing had been limited to several well-known datasets.
(Unless someone create their own one and implement their own code)
I created custom dataset import option (which supports JSON (as described in carpedm20/multi-speaker-tacotron-tensorflow) and CSV metadata format).

Some datasets, especially auto-generated Korean dataset is currently available in the format, and the format itself is quite straightforward.

Below is the result done with the custom dataset (LJSpeech dataset with Proprietary Audiobook, automatically aligned)

LJSpeech

https://www.dropbox.com/s/du807tjcpyw3ddj/step000240000_text5_multispeaker0_predicted.wav?dl=0

Audiobook

https://www.dropbox.com/s/0xrl2z30d88z5e8/step000240000_text5_multispeaker1_predicted.wav?dl=0

2. Custom dataset HTK-style preprocessing

I have created a code for phoneme alignment of custom dataset using Gentle (assuming its server is running somewhere).

Although Festival and Merlin based alignment was better for the VCTK dataset, the performance was severely undermined when noisy dataset was introduced.

e.g. for this audio where a pause from 2.6s to 4.5s can be observed, the Festival/Merlin generated label file is as follows.

label file

         0     150000 pau
    150000    1050000 ih
   1050000    3600000 f
   3600000    3750000 ih
   3750000    3900000 t
   3900000    4050000 w
   4050000    7900000 er
   7900000    8050000 n
   8050000    8550000 t
   8550000    9750000 f
   9750000   10150000 ao
  10150000   10300000 r
  10300000   11250000 m
  11250000   11400000 iy
  11400000   11500000 pau
  11500000   12700000 y
  12700000   12850000 uw
  12850000   13750000 d
  13750000   17000000 hh
  17000000   17250000 ae
  17250000   17400000 v
  17400000   19150000 b
  19150000   19299999 ih
  19299999   19450001 n
  19450001   19600000 s
  19600000   19900000 l
  19900000   24000001 ao
  24000001   24200001 t
  24200001   25350001 er
  25350001   25500000 d
  25500000   26199999 aa
  26199999   26550000 n
  26550000   26849999 dh
  26849999   27100000 ae
  27100000   27249999 t
  27249999   27400000 b
  27400000   32100000 ae
  32100000   32249999 t
  32249999   32400000 ax
  32400000   36150000 l
  36150000   36750000 f
  36750000   36900001 iy
  36900001   37049999 l
  37049999   37200000 d
  37200000   37300000 pau
  37300000   37449999 y
  37449999   37600000 uw
  37600000   38900001 hh
  38900001   39050000 ae
  39050000   45050001 v
  45050001   45900002 m
  45900002   46050000 eh
  46050000   46599998 n
  46599998   46750002 iy
  46750002   46900001 eh
  46900001   47049999 n
  47049999   47199998 ax
  47199998   47350001 m
  47350001   47500000 iy
  47500000   51350002 z
  51350002   51450000 pau
  51450000   51599998 m
  51599998   51750002 ay
  51750002   54699998 k
  54699998   54850001 ih
  54850001   57350001 ng
  57350001   57449999 pau
  57449999   57600002 b
  57600002   57750001 ah
  57750001   57900000 t
  57900000   58400002 ay
  58400002   60749998 s
  60749998   60900002 w
  60900002   62300000 eh
  62300000   62449999 r
  62449999   62600002 t
  62600002   63699999 ax
  63699999   65149999 y
  65149999   67849998 uw
  67849998   68000002 pau
  68000002   68150001 ay
  68150001   68299999 m
  68299999   68449998 n
  68449998   71849999 aa
  71849999   71999998 t
  71999998   72350001 w
  72350001   72500000 ah
  72500000   72649999 n
  72649999   73150001 ah
  73150001   73299999 v
  73299999   78850002 dh
  78850002   79000001 eh
  79000001   79150000 m
  79150000   79200001 pau

This does not show pause between 2.6s to 4.5s.

In contrast, with Gentle-based alignment, the generated HTK-style label file is as follows

Gentle-generated label file

0 5699999 silB
5699999 5800000 ih
5800000 5900000 f
5900000 6000000 ih
6000000 6100000 t
6100000 6799999 w
6799999 7400000 er
7400000 7800000 n
7800000 8100000 t
8100000 8500000 f
8500000 9300000 er
9400000 10100000 m
10100000 10900000 iy
10900000 11200000 y
11200000 11800000 uw
11800000 12300000 d
12600000 12700000 hh
12700000 12800000 ae
12800000 12900000 v
12900000 13300000 b
13300000 13600000 ih
13600000 14300000 n
14300000 15499999 s
15499999 16099999 l
16099999 16900000 ao
16900000 17700000 t
17700000 18100000 er
18100000 18500000 d
18500000 19100000 ao
19100000 19600000 n
19600000 20000000 dh
20000000 20400000 ae
20400000 21000000 t
21000000 21500000 b
21500000 22599999 ae
22599999 22699999 t
22699999 22799999 ah
22799999 22899999 l
22899999 22999999 f
22999999 23099999 iy
23099999 23199999 l
23199999 23299999 d
44800000 45400000 y
45400000 46000000 uw
46000000 46199999 hh
46199999 46899999 ae
46899999 47499999 v
47500000 48099999 m
48099999 48699999 eh
48699999 49399999 n
49399999 49999999 iy
50000000 50599999 eh
50599999 51300000 n
51300000 51499999 ah
51499999 52099999 m
52099999 52599999 iy
52599999 52899999 z
53400000 53800000 m
53800000 54500000 ay
54500000 55300000 k
55300000 56000000 ih
56000000 56900000 ng
57600000 57699999 b
57699999 58099999 ah
58099999 58199999 t
58500000 59799999 ay
59800000 60700000 s
60700000 61100000 w
61100000 61800000 eh
61800000 62100000 r
62100000 62900000 t
62900000 63500000 uw
63500000 63900000 y
63900000 65299999 uw
65300000 66100000 ay
66100000 66600000 m
66600000 66800000 n
66800000 67300000 aa
67300000 68200000 t
68200000 68500000 w
68500000 69100000 ah
69100000 69500000 n
69500000 70100000 ah
70100000 70600000 v
70600000 70799999 dh
70799999 71500000 eh
71500000 72600000 m
72600000 72600000 silE

Where silence between 2.33s to 4.48s can be observed. (though actual silence begins from approx 2.6s.

After applying gentle-based phoneme alignment, performance had been moderately increased.
(Using same custom dataset, with the same hparams)

Without phoneme alignment

https://www.dropbox.com/s/9gd6r2rh4ppfwy6/step000260000_text5_multispeaker3_predicted.wav?dl=0

With gentle-based phoneme alignment

https://www.dropbox.com/s/uankezza1vfpo88/Gentlestep000260000_text5_multispeaker3_predicted.wav?dl=0

I am also considering to use phoneme alignment result to trim in-speech long pauses (one shown in above), by setting maximum threshold of the interval between each phoneme (like 0.5s).

PS. sorry for messy commits, will try to squash it after the merge. :/

…f arguments) File "synthesis.py", line 137, in <module> model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True) File "synthesis.py", line 66, in tts sequence, text_positions=text_positions, speaker_ids=speaker_ids) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward text_positions, frame_positions, input_lengths) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward text_sequences, lengths=input_lengths, speaker_embed=speaker_embed) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward x = self.embed_tokens(text_sequences) <- change this to long! File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward output = torch.index_select(weight, 0, indices.view(-1)) TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index) changed text_sequence to long, as required by torch.index_select.

This reverts commit 5214c24.

In windows, this causes WinError 123

Windows Specific Filename bugfix (r9y9#58) reverse PR

Supports JSON format for dataset creation. Ensures compatibility with http://github.com/carpedm20/multi-Speaker-tacotron-tensorflow

PR for Version Up in upstream repos

Reverse PR

gitignore change

r9y9

Overall looks great! I'm happy to see your contributions. Once a few comments addressed I'd like to merge this. Let me know If you mess up squashing commits. I can hit squash and merge if you want.

r9y9 · 2018-04-28T03:17:50Z

.gitignore

+presets/deepvoice3_got.json
+presets/deepvoice3_gotOnly.json
+presets/deepvoice3_stest.json
+presets/deepvoice3_test.json


Can be safely removed? Assuming this is for your local only.

Yes I will remove it! :) Thanks for telling me

r9y9 · 2018-04-28T03:20:39Z

gentle_web_align.py

+    server_addr = arguments['--server_addr']
+    port = int(arguments['--port'])
+    max_unalign  = float(arguments['--max_unalign'])
+    if arguments['--nested-directories'] == None:


nits: I'd slightly prefer is None to ==None.

Great! I will change this too

r9y9 · 2018-04-28T03:25:18Z

gentle_web_align.py

+Created on Sat Apr 21 09:06:37 2018
+Phoneme alignment and conversion in HTK-style label file using Web-served Gentle
+This works on any type of english dataset.
+This allows its usage on Windows (Via Docker) and external server.


Just to be sure, the reason using server-based Gentle rather than python API is that it allows use on Windows, right? Any other reasons?

Yep, and also because Gentle is python 2 compatible only, while this repo is python3 compatible.

In addition, if we use server-based Gentle, we can also use external server.

r9y9 · 2018-04-28T03:32:18Z

gentle_web_align.py

+    if os.path.splitext(wav_name)[0] != os.path.splitext(txt_name)[0]:
+        print(' [!] wav name and transcript name does not match - exiting...')
+        return response
+    with open(txt_path, 'r', encoding='utf-8-sig') as txt_file:


I'm guessing encoding='utf-8-sig' is (almost) windows specific..? Did you see UnicodeError with encoding='utf-8'?

Well, it was in my case (probably because I am currently mixing up with Windows (for running pyTorch) and Linux(for data preparation/alignment)), and I think that setting encoidng='utf-8-sig' when opening file is better for ensuring compatibility.

r9y9 · 2018-04-28T03:34:27Z

hparams.py

@@ -125,6 +125,14 @@
    # Forced garbage collection probability
    # Use only when MemoryError continues in Windows (Disabled by default)
    #gc_probability = 0.001,
+
+	# json_meta mode only
+	# 0: "use all",


Please consider spaces rather than tab.

Oops O.o will change it.
Another vestigial element

r9y9 · 2018-04-28T04:02:58Z

nikl_m.py

@@ -4,6 +4,7 @@
 import os
 import audio
 import re
+from hparams import hparams


nits: this is not necessary because #74 merged

will change it! thanks!

r9y9 · 2018-04-28T04:03:30Z

setup.py

          "numba",
          "lws <= 1.0",
          "nltk",
+		  "requests",


Please consider spaces

Another vestigial element :/
will change it

r9y9 · 2018-04-28T04:03:59Z

setup.py

@@ -82,10 +82,11 @@ def create_readme_rst():
          "torch >= 0.3.0",
          "unidecode",
          "inflect",
-          "librosa",
+          "librosa == 0.5.1",


Could this be loosened? I'm using developement version of librosa.

For me, using stable latest version(from pip) caused synthesis error. (librosa/librosa#640)
Did it get fixed (it is for dev. version but for stable release)?

I don't use librosa.output.write_wav anymore so librosa/librosa#640 won't be a problem for me (and the repo). I can fix issues if you give me reproducible code. If you have code calling librosa.output.write_wav locally, try replacing it with scipy.io.wavefile.write.

Discovered it was due to my own modifications. Thanks :)

r9y9 · 2018-04-28T04:09:52Z

hparams.py

+    # 1: "ignore only unmatched_alignment",
+    # 2: "fully ignore recognition",
+    ignore_recognition_level = 2,
+    min_text=20,


I was also thinking about this and something like min_frames to remove short audio clips from training data. Just out of curiosity, did you get improvements by this? I believe the parameter highly depends on datasets and I'd be happy if you could leave a comment for exmaple: min_text=20 works good for dataset A but can be adjusted depends on dataset

Actually it was implemented for some reasons.

My automatic alignment tool(which I am going to release it soon) cannot handle short speeches well.

From my experience, short speeches in non-dedicated dataset(especially that are extracted from movie clips) were prone to noises, and different cadence of speech.
(e.g. word "help" in "The help that is needed is not there." vs. "help" in "HELP!!!")

(From my experience with other deep learning based tts) Even if the dataset is nearly noise-free and has uniform cadence, short speeches tend to interfere the result. (probably because my test set is usually at least 3 words long)

But it was implemented as a quick-fix, and I do know that min_frame is much much better solution.

Will leave the comments :)

r9y9 · 2018-04-28T04:13:13Z

nikl_m.py

+        spectrogram = audio.spectrogram(wav).astype(np.float32)
+    except:
+        print(wav_path)
+        print(wav)
    n_frames = spectrogram.shape[1]


This is just for debugging?

O.O thought I removed it already.
Will remove it.

rafaelvalle · 2018-04-28T22:44:15Z

TLDR; Given that it's harder to learn alignments within samples that have long pauses, it would be good to have a max_pause params as well. The set of params I think are needed include phrase_length_min, phrase_length_max, max_silence_length.

engiecat · 2018-04-29T00:49:27Z

@rafaelvalle Great idea!
I am considering to set max_silence_length variable. (I am not quite sure about setting up thresholds for phrase length)
Also, for my case, where I have to salvage most of the data, I am thinking to clip such silence into silence with uniform length. (e.g. silence of 1 sec, 0.5sec, 3 sec -> silence of 0.5 sec) because such long silence tend to occur between sentences, and probably unifying such pause may be helpful.
Ofc, I will test it first.

rafaelvalle · 2018-04-29T01:31:36Z

@engiecat It's important to have thresholds for phrase length such that we can optimize for batch size and GPU usage. Remember that the longest sample in a batch will dominate the length of the batch because all shorter samples will be padded to match the length of the longest sample.
The main challenge with long silences is attention, probably because silence has little information about what step to take next, which forces the decoder to focus on other inputs.

Actually, I haven't trained deepvoice3 before and would be interested in knowing how well it fares with datasets that have a lot of variety in silence or speech rate.

engiecat · 2018-04-29T08:40:00Z

@rafaelvalle
Thank you for your explanation.
For variety in silence, unifying the silence (or trimming silence) seems to be effective for attention alignment, especially for the end of a sentence. (I had been using librosa.effects.split for it) without it, howling noises tend to trail at the end of sentence.

r9y9#53 (comment) issue solved in PyTorch 0.4

engiecat · 2018-04-30T07:17:51Z

@r9y9
Changes suggested were implemented

+) I fixed issue of #5 by changing the backend of matplotlib from Tkinter(TkAgg) to PyQt5(Qt5Agg).
(See https://stackoverflow.com/questions/14694408/runtimeerror-main-thread-is-not-in-main-loop and http://matplotlib.1069221.n5.nabble.com/Matplotlib-Tk-and-multithreading-td40647.html )

++) Also, I discovered that the issue given in #53 (comment) seemed to be solved after PyTorch 0.4 upgrade. So, instead of changing hparams, I changed the code to give just a warning.

r9y9

LGTM. Thanks!

…pport (r9y9#78) * Fixed typeerror (torch.index_select received an invalid combination of arguments) File "synthesis.py", line 137, in <module> model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True) File "synthesis.py", line 66, in tts sequence, text_positions=text_positions, speaker_ids=speaker_ids) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward text_positions, frame_positions, input_lengths) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward text_sequences, lengths=input_lengths, speaker_embed=speaker_embed) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward x = self.embed_tokens(text_sequences) <- change this to long! File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward output = torch.index_select(weight, 0, indices.view(-1)) TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index) changed text_sequence to long, as required by torch.index_select. * Fixed Nonetype error in collect_features * requirements.txt fix * Memory Leakage bugfix + hparams change * Pre-PR modifications * Pre-PR modifications 2 * Pre-PR modifications 3 * Post-PR modification * remove requirements.txt * num_workers to 1 in train.py * Windows log filename bugfix * Revert "Windows log filename bugfix" This reverts commit 5214c24. * merge 2 * Windows Filename bugfix In windows, this causes WinError 123 * Cleanup before PR * JSON format Metadata support Supports JSON format for dataset creation. Ensures compatibility with http://github.com/carpedm20/multi-Speaker-tacotron-tensorflow * Web based Gentle aligner support * README change + gentle patch * .gitignore change gitignore change * Flake8 Fix * Post PR commit - Also fixed #5 r9y9#53 (comment) issue solved in PyTorch 0.4 * Post-PR 2 - .gitignore

Wenqikry · 2018-11-15T09:23:59Z

Loss: 0.24586915151745664
208it [00:34, 6.90it/s]Save intermediate states at step 10000
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.5/site-packages/PIL/Image.py", line 2460, in fromarray
mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 111), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 983, in
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 711, in train
mel, y, input_lengths, checkpoint_dir)
File "train.py", line 447, in save_states
writer.add_image(tag, np.uint8(cm.viridis(np.flip(alignment, 1).T) * 255), global_step)
File "/root/anaconda3/lib/python3.5/site-packages/tensorboardX/writer.py", line 412, in add_image
self.file_writer.add_summary(image(tag, img_tensor), global_step, walltime)
File "/root/anaconda3/lib/python3.5/site-packages/tensorboardX/summary.py", line 205, in image
image = make_image(tensor, rescale=rescale)
File "/root/anaconda3/lib/python3.5/site-packages/tensorboardX/summary.py", line 243, in make_image
image = Image.fromarray(tensor)
File "/root/anaconda3/lib/python3.5/site-packages/PIL/Image.py", line 2463, in fromarray
raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type
Exception ignored in: <bound method tqdm.del of 208it [00:34, 6.90it/s]>
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.5/site-packages/tqdm/_tqdm.py", line 889, in del
self.close()
File "/root/anaconda3/lib/python3.5/site-packages/tqdm/_tqdm.py", line 1095, in close
self._decr_instances(self)
File "/root/anaconda3/lib/python3.5/site-packages/tqdm/_tqdm.py", line 454, in _decr_instances
cls.monitor.exit()
File "/root/anaconda3/lib/python3.5/site-packages/tqdm/_monitor.py", line 52, in exit
self.join()
File "/root/anaconda3/lib/python3.5/threading.py", line 1051, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
waht should i do?Thank you！

engiecat added 26 commits March 3, 2018 19:14

Fixed Nonetype error in collect_features

720cf1c

requirements.txt fix

6d4d594

Memory Leakage bugfix + hparams change

e84b923

Pre-PR modifications

030de15

Pre-PR modifications 2

3075486

Pre-PR modifications 3

b8252ae

Post-PR modification

052d030

remove requirements.txt

92a84d9

num_workers to 1 in train.py

a155fb9

Merge branch 'master' into master

747f2e0

Windows log filename bugfix

5214c24

Revert "Windows log filename bugfix"

e22388a

This reverts commit 5214c24.

Merge remote-tracking branch 'upstream/master'

d7908d0

merge 2

a6969ac

Windows Filename bugfix

15eb591

In windows, this causes WinError 123

Cleanup before PR

d1258e7

Merge pull request #3 from r9y9/master

89760d2

Windows Specific Filename bugfix (r9y9#58) reverse PR

JSON format Metadata support

ba182f9

Supports JSON format for dataset creation. Ensures compatibility with http://github.com/carpedm20/multi-Speaker-tacotron-tensorflow

Web based Gentle aligner support

5d104e6

Merge pull request #4 from r9y9/master

32cab90

PR for Version Up in upstream repos

Merge pull request #5 from r9y9/master

6d8973a

Reverse PR

README change + gentle patch

9bae706

Merge branch 'master' of https://github.com/engiecat/deepvoice3_pytorch

3c61d46

.gitignore change

d9e8cc7

gitignore change

Flake8 Fix

543a418

r9y9 mentioned this pull request Apr 28, 2018

Is it possible to train any other language? #60

Closed

r9y9 reviewed Apr 28, 2018

View reviewed changes

Post PR commit - Also fixed #5

132cd14

r9y9#53 (comment) issue solved in PyTorch 0.4

Post-PR 2 - .gitignore

8fc35ad

r9y9 approved these changes Apr 30, 2018

View reviewed changes

r9y9 merged commit a0f65c6 into r9y9:master Apr 30, 2018

r9y9 mentioned this pull request May 6, 2018

Remove hard-coded switching backend of matplotlib and pyqt dependency #85

Merged

engiecat mentioned this pull request May 12, 2018

Issue80 engiecat/deepvoice3_pytorch#8

Merged

engiecat mentioned this pull request Jul 10, 2018

Additional detail on using preprocess.py with gentle phoneme data #96

Closed

Tomiinek mentioned this pull request Aug 22, 2020

Memory leak Tomiinek/Multilingual_Text_to_Speech#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Dataset support + Gentle-based custom dataset preprocessing support #78

Custom Dataset support + Gentle-based custom dataset preprocessing support #78

engiecat commented Apr 28, 2018 •

edited

Loading

r9y9 left a comment

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

r9y9 Apr 29, 2018

engiecat Apr 29, 2018

r9y9 Apr 28, 2018

engiecat Apr 28, 2018 •

edited

Loading

r9y9 Apr 28, 2018

engiecat Apr 28, 2018

rafaelvalle commented Apr 28, 2018

engiecat commented Apr 29, 2018

rafaelvalle commented Apr 29, 2018

engiecat commented Apr 29, 2018

engiecat commented Apr 30, 2018

r9y9 left a comment

Wenqikry commented Nov 15, 2018

Custom Dataset support + Gentle-based custom dataset preprocessing support #78

Custom Dataset support + Gentle-based custom dataset preprocessing support #78

Conversation

engiecat commented Apr 28, 2018 • edited Loading

1. Custom dataset support.

2. Custom dataset HTK-style preprocessing

r9y9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engiecat Apr 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelvalle commented Apr 28, 2018

engiecat commented Apr 29, 2018

rafaelvalle commented Apr 29, 2018

engiecat commented Apr 29, 2018

engiecat commented Apr 30, 2018

r9y9 left a comment

Choose a reason for hiding this comment

Wenqikry commented Nov 15, 2018

engiecat commented Apr 28, 2018 •

edited

Loading

engiecat Apr 28, 2018 •

edited

Loading