Finnish conversational chat corpus, source

View resource name in all available languages

Suomenkielinen chat-keskustelukorpus, lähdemateriaali

finchat-src

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2022060801

Access location:

This resource is available for download in Kielipankki – the Language Bank of Finland.

The FinChat corpus consists of 85 Finnish chat dialogs collected in 2019-2020. The participants (N=62) were native speakers of Finnish in three age-based user groups: high school students (16-19 years), university students (20-25 years) and university staff (25 years or above). The data were collected in multiple sessions, where each session had participants from the same group. In each session, participants were paired randomly and used fake names to maintain anonymity. At the beginning of the session, participants were given a topic to discuss. After chatting for 10-15 minutes, the conversation partners were switched and a new topic was given. In a session, each participant had two or three conversations. After each conversation, participants self-evaluated their conversation with a questionnaire.

In addition to the conversation content, each message in the FinChat dataset includes a timestamp, the sender's id, and metadata. The metadata includes information on participant id, age group, topic, and questionnaire results. For some conversations, however, evaluations or answers are missing from the metadata and some evaluation questions were answered multiple times.

Before each session, the participants were instructed to ask one question at the time and to wait for their partner’s reply, to use conversational language, and not to use any abusive language. They were also asked not to share any personal information or copyrighted material. Those parts of the conversations which did not respect the instructions were removed.

This dataset was retrieved from the publicly available GitHub repository, previously available at https://github.com/aalto-speech/FinChat.

For further details, see the article below.

Please cite the following paper when using the corpus:
K. Leino, J. Leinonen, M. Singh, S. Virpioja and M. Kurimo. (2020) Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics. In Proceedings of Interspeech 2020. ISCA - International Speech Communication Association, Baixas, pp. 429-433. Available: https://doi.org/10.21437/Interspeech.2020-2511

You don’t have the permission to edit this resource.