Skip to content

Latest commit

 

History

History
69 lines (43 loc) · 4.41 KB

A Gentle Summary of Postfilter-related Work.md

File metadata and controls

69 lines (43 loc) · 4.41 KB

A Gentle Summary of Postfilter-related Work

近年来基于深度学习方法的 postfilter 的工作大致有以下几种:

  1. 通过训练目标等设计 encoder-decoder 结构中的 decoder 起到 postfilter 的作用 [1];
  2. 先进行预分离,再通过基于 NN 的 postfilter 进行进一步分离 [2,3];
  3. 在 AEC 的任务中,讲传统的 postfilter 模块替换成 NN 的 [4,5];
  4. 添加额外网络进行后处理 [6];
  5. 用 GAN 的判别器对增强后的语谱进行后处理 [7,8]。

下面是这些文章的详细介绍

NN-based Postfilter (Decoder)

Decoder: signal filtering & reconstruction (两个 encoder 分别对 mag 和 phase 进行 masking 和 mapping)

先在时频域预分离,然后将混合语音和预分离的语音同时作为输入,通过 一维卷积和 attention 进行特征融合,融合的特征送入 TCN-based postfilter

先预分离,再根据 speaker 信息进一步分离 笔记

用两个全卷积循环网络 FCRN,首先是 AEC 模块估计回声,再用后置滤波模块进行残留回声抑制。

Follow $Y^2$-Net 的工作,增加了频带宽度扩展 Bandwidth Extension。

decoder 后面接上超分网络 + 渐进学习恢复降采样带来的信息损失 笔记

GAN-related

Generated spectra typically lack the fine structures that are close to those of the true data. Propose a GAN-based postfilter that is implicitly optimized to match the true feature distribution in adversarial learning.

GAN cannot be easily trained for very high-dimensional data such as STFT spectra. Thus take divide-and-concatenate strategy: first divide the spectrograms into multiple freq bands with overlap, reconstruct the individual bands using the GAN-based postfilter trained for each band, and connect th bands with overlap.

Conventional

Loss

利用音素相关的信息计算 enhanced speech 和 clean speech 之间的loss (PFPL)

Other

Multiple inputs (features extracted from far-end reference and the echo estimated by the Linear Adaptive Filter) are weighted by a feature attention module.

加入 feature attention module 来更好地融合远端参考信号、LAF输出的 $E(k,f)$ 和 估计的线性回声 $C(k,f)$ 三种输入,而非简单的拼接。

如题

Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020