Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks
Parviainen, Mikko; Pertila, Pasi; Virtanen, Tuomas; Grosche, Peter (2018-11-02)
Parviainen, Mikko
Pertila, Pasi
Virtanen, Tuomas
Grosche, Peter
IEEE
02.11.2018
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202001161338
https://urn.fi/URN:NBN:fi:tuni-202001161338
Kuvaus
Peer reviewed
Tiivistelmä
This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.
Kokoelmat
- TUNICRIS-julkaisut [16977]