Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks

Parviainen, Mikko; Pertila, Pasi; Virtanen, Tuomas; Grosche, Peter

Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks

Parviainen, Mikko; Pertila, Pasi; Virtanen, Tuomas; Grosche, Peter (2018-11-02)

Avaa tiedosto

time_frequency_masking_strategies.pdf (134.0Kt)

Lataukset:

Parviainen, Mikko

Pertila, Pasi

Virtanen, Tuomas

Grosche, Peter

IEEE

02.11.2018

doi:10.1109/IWAENC.2018.8521400

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202001161338

Kuvaus

Peer reviewed

Tiivistelmä

This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.

Kokoelmat

TUNICRIS-julkaisut [16977]