Yueyue Na1, Ziteng Wang1, Liang Wang2, Qiang Fu1
1Alibaba Group, China
{yueyue.nyy, ziteng.wzt, fq153277}@alibaba-inc.com
2School of Electronics and Communication Engineering
Sun Yat-sen University (SYSU), Guangzhou, Guangdong, 510275, China
[email protected]
Keyword spotting is necessary for triggering human-machine speech interaction. It is a challenging task especially in low signal-to-noise ratio and moving scenarios, such as on a sweeping robot with strong ego-noise. This paper proposes a novel approach for joint ego-noise suppression and keyword detection. The keyword detection model accepts outputs from multi-look adaptive beamformers. The noise covariance matrix in the beamformer is in turn updated using the keyword absence probability given by the model, forming an end-to-end loop-back. The keyword model also adopts a multi-channel feature fusion using self-attention, and a hidden Markov model for online decoding. The performance of the proposed approach is verified on real-word datasets recorded on a sweeping robot.
ICASSP 2022 论文分享:语音增强与关键词检测联合优化技术在扫地机器人中的应用
The differential beamformers used in this paper is generated by solving the following complex optimization problem by CVX [1].
Where
- [1] [cvx] (http://cvxr.com/cvx/)