Donate Speech Corpus: Training data (100h)

77 Last view: 2024-03-20

4 Last update: 2023-04-20

Donate Speech Corpus: Training data (100h)

View resource name in all available languages

Lahjoita puhetta -aineisto: Opetusdata (100h)

puhelahjat-train

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2022060123

This resource is available for download in Kielipankki - The Language Bank of Finland as part of "Donate Speech: Selected dataset", http://urn.fi/urn:nbn:fi:lb-2022060127.

The resource contains a subset of 100 hours of transcribed speech that was selected from the Donate Speech Corpus and used for training an ASR system at Aalto University.

The training data includes speech from 1129 different speakers (according to the metadata accompanying the original recordings). Note that the training dataset has just over 20% of male speakers, whereas the puhelahjat-test and puhelahjat-dev sets contain 40% of male speakers.

For speech technology development purposes, the training dataset can be used together with the puhelahjat-test and puhelahjat-dev datasets. There is no overlap of speakers between these three sets.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 03/24/2023

Licence

Other

Restrictions: Attribution, Commercial Use, No Redistribution, Other

User Nature: Commercial

Licensors:

University of Helsinki

Distribution rights holders:

University of Helsinki

CLARIN RES

Restrictions: Academic - Non Commercial Use, Attribution, No Redistribution, Other

User Nature: Academic

Licensors:

University of Helsinki

Distribution rights holders:

University of Helsinki

Contact Person

Anssi Moisio

text
audio

Monolingual text corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

6,200 Minutes

103 Hours

Monolingual audio corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

6,229 Files

6,200 Minutes

103 Hours

Resource Creation

Resource Creator

Anssi Moisio

Metadata

Created: 06/01/2022

Last Updated: 04/20/2023

Metadata Creator

Mietta Lennes

Version

Version: 1

Relation

Related Resource: Donate Speech Corpus, http://urn.fi/urn:nb...

Relation Type: IsPartOf

People who looked at this resource also viewed the following:

Resources from the same creators