A High SIR Low-overhead Implementation of Single-channel Speech Source Separation

Lawrence Nwaogo; Jerker Björkqvist

doi:10.1109/SAM53842.2022.9827866

A High SIR Low-overhead Implementation of Single-channel Speech Source Separation

Lawrence Nwaogo^*, Jerker Björkqvist

^*Corresponding author for this work

Information Technology

Research output: Chapter in Book/Conference proceeding › Conference contribution › Scientific › peer-review

30 Downloads (Pure)

Abstract

In the field of speech signal processing, speech source mixture separation is a known challenge. It is addressed by finding the closest estimate of the original speech source from the speech mixture. Source separation solutions can be based on multiple channels or single channel model. In multiple channels, multiple speakers and microphones are assumed while in single channel multiple speakers and a single microphone are assumed. One of the most widely used algorithms in the single-channel model is the Ideal Ratio Mask (IRM). Although IRM is efficient, it has a major drawback; the high memory footprint as it stores all frequency components of the Short-time Fourier transform (STFT). This makes it less suitable for embedded applications. We propose a solution based on the optimization of Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid K-nearest neighbor (Nk-nn) algorithms that minimizes memory utilization and achieves high Signal-to-Interference Ratio (SIR). Our experimental results show that the proposed solution improves SIR while minimizing memory requirements compared to the reference IRM.

Original language	English
Title of host publication	2022 IEEE 12th Sensor Array and Multichannel Signal Processing Workshop (SAM)
Publisher	IEEE
Pages	440-444
ISBN (Electronic)	9781665406338
ISBN (Print)	978-1-6654-0634-5
DOIs	https://doi.org/10.1109/SAM53842.2022.9827866
Publication status	Published - 22 Jul 2022
MoE publication type	A4 Article in a conference publication
Event	Sensor Array and Multichannel Signal Processing Workshop - Duration: 20 Jun 2022 → …

Conference

Conference	Sensor Array and Multichannel Signal Processing Workshop
Period	20/06/22 → …

Keywords

Source separation
Memory management
Signal processing algorithms
Cepstrum
Production facilities
Speech processing
Mel frequency cepstral coefficient

Access to Document

10.1109/SAM53842.2022.9827866

a93-nwaogo finalAccepted author manuscript, 200 KBLicence: Publisher rights policy

https://urn.fi/URN:NBN:fi-fe2022091559097

Cite this

@inproceedings{145835fdde06469c8b0124617958333f,

title = "A High SIR Low-overhead Implementation of Single-channel Speech Source Separation",

abstract = "In the field of speech signal processing, speech source mixture separation is a known challenge. It is addressed by finding the closest estimate of the original speech source from the speech mixture. Source separation solutions can be based on multiple channels or single channel model. In multiple channels, multiple speakers and microphones are assumed while in single channel multiple speakers and a single microphone are assumed. One of the most widely used algorithms in the single-channel model is the Ideal Ratio Mask (IRM). Although IRM is efficient, it has a major drawback; the high memory footprint as it stores all frequency components of the Short-time Fourier transform (STFT). This makes it less suitable for embedded applications. We propose a solution based on the optimization of Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid K-nearest neighbor (Nk-nn) algorithms that minimizes memory utilization and achieves high Signal-to-Interference Ratio (SIR). Our experimental results show that the proposed solution improves SIR while minimizing memory requirements compared to the reference IRM.",

keywords = "Source separation, Memory management, Signal processing algorithms, Cepstrum, Production facilities, Speech processing, Mel frequency cepstral coefficient",

author = "Lawrence Nwaogo and Jerker Bj{\"o}rkqvist",

note = "https://www.ieee.org/publications/rights/author-posting-policy.html; Sensor Array and Multichannel Signal Processing Workshop ; Conference date: 20-06-2022",

year = "2022",

month = jul,

day = "22",

doi = "10.1109/SAM53842.2022.9827866",

language = "English",

isbn = "978-1-6654-0634-5",

pages = "440--444",

booktitle = "2022 IEEE 12th Sensor Array and Multichannel Signal Processing Workshop (SAM)",

publisher = "IEEE",

address = "United States",

}

TY - GEN

T1 - A High SIR Low-overhead Implementation of Single-channel Speech Source Separation

AU - Nwaogo, Lawrence

AU - Björkqvist, Jerker

N1 - https://www.ieee.org/publications/rights/author-posting-policy.html

PY - 2022/7/22

Y1 - 2022/7/22

N2 - In the field of speech signal processing, speech source mixture separation is a known challenge. It is addressed by finding the closest estimate of the original speech source from the speech mixture. Source separation solutions can be based on multiple channels or single channel model. In multiple channels, multiple speakers and microphones are assumed while in single channel multiple speakers and a single microphone are assumed. One of the most widely used algorithms in the single-channel model is the Ideal Ratio Mask (IRM). Although IRM is efficient, it has a major drawback; the high memory footprint as it stores all frequency components of the Short-time Fourier transform (STFT). This makes it less suitable for embedded applications. We propose a solution based on the optimization of Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid K-nearest neighbor (Nk-nn) algorithms that minimizes memory utilization and achieves high Signal-to-Interference Ratio (SIR). Our experimental results show that the proposed solution improves SIR while minimizing memory requirements compared to the reference IRM.

AB - In the field of speech signal processing, speech source mixture separation is a known challenge. It is addressed by finding the closest estimate of the original speech source from the speech mixture. Source separation solutions can be based on multiple channels or single channel model. In multiple channels, multiple speakers and microphones are assumed while in single channel multiple speakers and a single microphone are assumed. One of the most widely used algorithms in the single-channel model is the Ideal Ratio Mask (IRM). Although IRM is efficient, it has a major drawback; the high memory footprint as it stores all frequency components of the Short-time Fourier transform (STFT). This makes it less suitable for embedded applications. We propose a solution based on the optimization of Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid K-nearest neighbor (Nk-nn) algorithms that minimizes memory utilization and achieves high Signal-to-Interference Ratio (SIR). Our experimental results show that the proposed solution improves SIR while minimizing memory requirements compared to the reference IRM.

KW - Source separation

KW - Memory management

KW - Signal processing algorithms

KW - Cepstrum

KW - Production facilities

KW - Speech processing

KW - Mel frequency cepstral coefficient

U2 - 10.1109/SAM53842.2022.9827866

DO - 10.1109/SAM53842.2022.9827866

M3 - Conference contribution

SN - 978-1-6654-0634-5

SP - 440

EP - 444

BT - 2022 IEEE 12th Sensor Array and Multichannel Signal Processing Workshop (SAM)

PB - IEEE

T2 - Sensor Array and Multichannel Signal Processing Workshop

Y2 - 20 June 2022

ER -

A High SIR Low-overhead Implementation of Single-channel Speech Source Separation

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this