-
Notifications
You must be signed in to change notification settings - Fork 22
/
Copy pathREADME
70 lines (51 loc) · 2.53 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
PercepNet (Still need to be tuned)
Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech
Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras.
----------------------------------------------------------
Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz.
This file need to be extracted before furthur compileing.
% cd src
% tar -xzvf rnn_data.c.tgz
% cd ..
To compile, just type:
% ./autogen.sh
% ./configure
% make
A simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:
./examples/rnnoise_demo <noisy speech> <output denoised>
The output is also a 16-bit raw PCM file.
------------------------------------------------------------
How to train:
(change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/)
cd ~/percepnet/src
./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32
(change to training subdirectory)
cd ../training
python bin2hdf5.py …/src/training.f32 80000000 138 training.h5
python rnn_train.py
python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig
cp rnn_data.c ../src/
(change to percepnet directory)
cd ~/percepnet/
make clean
make
(change to example subdirectory)
cd examples
./rnnoise_demo test2.raw test2_denoised.raw
----------------------------------------------------------------
More:
The performance of this version needs furthur optimization and tuning.
In some cases, it is worse than Rnnoise.
Any comments on how to optimize/tune are welcome.
test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check
whether g,r work not not.
The overall framework is based on https://github.com/xiph/rnnoise
And the speech signal processing codes are from https://github.com/jzi040941/PercepNet
Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c.
And during training, clip_norm is set to 0.1, or loss will be NAN.
The training data are from:
https://github.com/microsoft/DNS-Challenge
Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html