Skip to content

JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS

Notifications You must be signed in to change notification settings

nay0648/ego2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS

Yueyue Na1, Ziteng Wang1, Liang Wang2, Qiang Fu1

1Alibaba Group, China
{yueyue.nyy, ziteng.wzt, fq153277}@alibaba-inc.com

2School of Electronics and Communication Engineering
Sun Yat-sen University (SYSU), Guangzhou, Guangdong, 510275, China
[email protected]

ABSTRACT

Keyword spotting is necessary for triggering human-machine speech interaction. It is a challenging task especially in low signal-to-noise ratio and moving scenarios, such as on a sweeping robot with strong ego-noise. This paper proposes a novel approach for joint ego-noise suppression and keyword detection. The keyword detection model accepts outputs from multi-look adaptive beamformers. The noise covariance matrix in the beamformer is in turn updated using the keyword absence probability given by the model, forming an end-to-end loop-back. The keyword model also adopts a multi-channel feature fusion using self-attention, and a hidden Markov model for online decoding. The performance of the proposed approach is verified on real-word datasets recorded on a sweeping robot.

Links

ICASSP 2022 论文分享:语音增强与关键词检测联合优化技术在扫地机器人中的应用

Generate Differential Beamformers

The differential beamformers used in this paper is generated by solving the following complex optimization problem by CVX [1].

$$ \begin{aligned} \mathbf{w} = &\min \mathbf{w}^H \mathbf{\Phi} \mathbf{w} \\\\ s.t. \quad &\mathbf{w}^H \mathbf{a} = 1 \\\\ &\mathbf{w}^H \mathbf{w} \le 10^{g_{min} / 10} \end{aligned} $$

Where $\mathbf{w}$ is the beamformer, $\mathbf{\Phi}$ is the noise covariance matrix, $\mathbf{a}$ is the look direction's steering vector, and $g_{min}$ in dB is the white noise gain threshold. The first constraint is used to prevent target speech being cancelled (distortionless constraint), and the second constraint is used to avoid the white noise amplification phenomenon.

References

About

JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS

Topics

Resources

Stars

Watchers

Forks

Languages