The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
Date
2018-11
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
24-28
Series
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)
Abstract
In this paper, we presented a neural network system for DCASE 2018 task 2, general purpose audio tagging. We fine-tuned the Google AudioSet feature generation model with different settings for the given 41 classes on top of a fully connected layer with 100 units. Then we used the fine-tuned models to generate 128 dimensional features for each 0.960s audio. We tried different neural network structures including LSTM and multi-level attention models. In our experiments, the multi-level attention model has shown its superiority over others. Truncating the silence parts, repeating and splitting the audio into the fixed length, pitch shifting augmentation, and mixup techniques are all used in our experiments. The proposed system achieved a result with MAP@3 score at 0.936, which outperforms the baseline result of 0.704 and achieves top 8% in the public leaderboard.
Description
Keywords
Other note
Citation
Xu , Z , Smit , P & Kurimo , M 2018 , The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging . in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018) . , 29 , Tampere University of Technology , pp. 24-28 , Detection and Classification of Acoustic Scenes and Events , Surrey , United Kingdom , 19/11/2018 . < http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Xu_29.pdf >