Formanttien, perustaajuuden ja spektrin kaltevuuden muutokset imitoinnissa
LESKELÄ, JOHANNA (2011)
LESKELÄ, JOHANNA
2011
Puhetekniikka ja vokologia - Speech Technique and Vocology
Kasvatustieteiden yksikkö - School of Education
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2011-09-13
Julkaisun pysyvä osoite on
https://urn.fi/urn:nbn:fi:uta-1-21781
https://urn.fi/urn:nbn:fi:uta-1-21781
Tiivistelmä
Changes in F0, formant frequencies and spectral slope in imitation
Voice quality, pitch register, prosody and speech style are some of the features that are important in voice imitations (Zetterholm, 2002). Timbre may be seen as a synonym for voice quality and thus to consist of characteristics of the voice source (type of phonation) and the vocal tract filter (formants). Sometimes ‘timbre’ has been used to refer to filter characteristics only (voice colour). How much an imitator is able to change and control timbre, voice quality and the other factors that make our voices personal?
The purpose of this research was to find out how an imitator changes voice quality, timbre, F0 and articulation in the imitation task and how close he gets to the target voice in terms of fundamental frequency, formant frequencies and the spectral slope that characterizes the type of phonation. The role of these changes for the auditory perception of success of imitation was studied. The relationship between auditory perception and acoustic analysis was also considered.
The research consisted of two sections: 1) acoustic analysis of the imitator’s own voice, the original target voices and the imitations of the target voices and 2) auditive comparison of the imitations and the original voices. The testee was a Finnish experienced professional imitator whose repertoire includes over 120 voices, also singing voices, and over 100 technical voices. For this study, he listened to 11 voices of Finnish politicians, opera singers and radio announcers. The samples consisted of 9 text reading samples and two separate short sentences. The imitator imitated the voices, and he also read the texts used in the imitations in his own voice. From all the three sample types (original voice, imitated voice and the imitator’s own voice) the mean fundamental frequency (F0) and the spectral slope (in this case alpha ratio, i.e. sound level difference above and below 1 kHz) were calculated for the whole of the text. Additionally, from the text samples one vowel [a:] was extracted. The four lowest formant frequencies of the vowel were measured. A synthesis was carried out based on the formant frequencies. A song synthesizer MADDE was used. In the auditive tests the listeners (N=10) were first asked to evaluate the similarity between the whole imitation and the original speech samples. The characters rated for similarity (scale from 1 to 3) were voice quality, voice timbre, pitch and articulation The synthesized vowels were rated for the similarity in voice quality and voice timbre (from 1 to 3).
In most of the imitated samples many formants changed towards the original voice’s formants. Spectral slope differed in nearly all cases between the imitation and the imitator’s own voice. In 5/11 samples the change was towards the original voice. The average F0 also changed in 5/11 imitations towards the original voice’s values. The F0 results in acoustical analysis are in line with Watts and Hall’s (2008) discovery, that voice timbre affect to the ability to reach the target F0. The mean differencies between the imitations and the original voices were 0-39 % for F0, 0-27 % for F1, 0-18 % for F2, 0-45 % for F3, 0-15 % for F4 and 5-85 % for the spectral slope. There was no statistical significant correlation between auditive results and acoustical analysis results. Perceptual evaluation of the synthesized samples showed a correlation between formant frequencies and rated similarity of the timbre.
In general, the imitator is able to certain extent change the formants in his voice to get them closer to the formants of the target voice. The results are in line with Endres, Bambach and Flösser’s (1970) discovery, that imitation is a combination of many different characters. Systematic changes in all the characters are hard to discover since the characters are being used to different extend. There should be more specific synthesis to find out which are the adequate conditions of the similarity between voices.
Asiasanat:Voice imitation, formant frequencies, F0, spectral slope, timbre Imitaatio, formanttitaajuudet, F0, spektrin kaltevuus, äänenväri
Voice quality, pitch register, prosody and speech style are some of the features that are important in voice imitations (Zetterholm, 2002). Timbre may be seen as a synonym for voice quality and thus to consist of characteristics of the voice source (type of phonation) and the vocal tract filter (formants). Sometimes ‘timbre’ has been used to refer to filter characteristics only (voice colour). How much an imitator is able to change and control timbre, voice quality and the other factors that make our voices personal?
The purpose of this research was to find out how an imitator changes voice quality, timbre, F0 and articulation in the imitation task and how close he gets to the target voice in terms of fundamental frequency, formant frequencies and the spectral slope that characterizes the type of phonation. The role of these changes for the auditory perception of success of imitation was studied. The relationship between auditory perception and acoustic analysis was also considered.
The research consisted of two sections: 1) acoustic analysis of the imitator’s own voice, the original target voices and the imitations of the target voices and 2) auditive comparison of the imitations and the original voices. The testee was a Finnish experienced professional imitator whose repertoire includes over 120 voices, also singing voices, and over 100 technical voices. For this study, he listened to 11 voices of Finnish politicians, opera singers and radio announcers. The samples consisted of 9 text reading samples and two separate short sentences. The imitator imitated the voices, and he also read the texts used in the imitations in his own voice. From all the three sample types (original voice, imitated voice and the imitator’s own voice) the mean fundamental frequency (F0) and the spectral slope (in this case alpha ratio, i.e. sound level difference above and below 1 kHz) were calculated for the whole of the text. Additionally, from the text samples one vowel [a:] was extracted. The four lowest formant frequencies of the vowel were measured. A synthesis was carried out based on the formant frequencies. A song synthesizer MADDE was used. In the auditive tests the listeners (N=10) were first asked to evaluate the similarity between the whole imitation and the original speech samples. The characters rated for similarity (scale from 1 to 3) were voice quality, voice timbre, pitch and articulation The synthesized vowels were rated for the similarity in voice quality and voice timbre (from 1 to 3).
In most of the imitated samples many formants changed towards the original voice’s formants. Spectral slope differed in nearly all cases between the imitation and the imitator’s own voice. In 5/11 samples the change was towards the original voice. The average F0 also changed in 5/11 imitations towards the original voice’s values. The F0 results in acoustical analysis are in line with Watts and Hall’s (2008) discovery, that voice timbre affect to the ability to reach the target F0. The mean differencies between the imitations and the original voices were 0-39 % for F0, 0-27 % for F1, 0-18 % for F2, 0-45 % for F3, 0-15 % for F4 and 5-85 % for the spectral slope. There was no statistical significant correlation between auditive results and acoustical analysis results. Perceptual evaluation of the synthesized samples showed a correlation between formant frequencies and rated similarity of the timbre.
In general, the imitator is able to certain extent change the formants in his voice to get them closer to the formants of the target voice. The results are in line with Endres, Bambach and Flösser’s (1970) discovery, that imitation is a combination of many different characters. Systematic changes in all the characters are hard to discover since the characters are being used to different extend. There should be more specific synthesis to find out which are the adequate conditions of the similarity between voices.
Asiasanat:Voice imitation, formant frequencies, F0, spectral slope, timbre Imitaatio, formanttitaajuudet, F0, spektrin kaltevuus, äänenväri