TallVocabL2Fi: Measurements of 15 L2 Finnish learners' vocabularies

58 Last view: 2024-03-26

6 Last update: 2022-06-17

TallVocabL2Fi: Measurements of 15 L2 Finnish learners' vocabularies

View resource name in all available languages

TallVocabL2Fi: Mitat 15 S2-opiskelijan sanavarastosta

tallvocabl2fi

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2022041921

Access location: http://urn.fi/urn:nbn:fi:lb-2022041922

The TallVocabL2Fi dataset comprises of responses from 15 participants to a "tall" 12000 word 5-point scale self-rating response task and a 100 word confirmatory word translation response task. The 15 participants were split by native language, 5 English, 4 Hungarian and 6 Russian, and self-reported CEFR reading level, 5 B1, 4 B2, 5 C1 and 2 C2. The data was gathered through a website from paid participants resident in Finland over a period of 3 months from September and November 2021. In total there are 180 thousand word knowledge self-rating responses and 1.5 thousand word translation responses.

The dataset is unique in its combination of the tall data collection set up, where responses are collected for many words, the varied backgrounds of the learners, the use of Finnish prompt words, and the triangulation with a word translation test. The dataset can be used for vocabulary acquisition research in general, but it is particularly suited to evaluation of the task of Vocabulary Inventory Prediction (VIP) including techniques based on Computer-Adaptive Testing (CAT).

The dataset is relational/tabular. It is distributed as a series of TSV files along with a SQL schema exported from DuckDB.

The TallVocabL2Fi dataset is available for download via Kielipankki – The Language Bank of Finland.

Further information about the schema and the collection process is available in the readme included with the data, and in the accompanying publication:

Robertson, F., Chang & L., Söyrinki, S. (2022). TallVocabL2Fi: An Extensive Mapping of 15 Finnish L2 Learners' Vocabulary. In Language Resources and Evaluation Conference (LREC 2022).

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

CC - ZERO

Distribution Access/Medium: Downloadable

Attribution Details: Robertson, Frankie (to be published in 2022). TallVocabL2Fi: Measurements of 15 L2 Finnish learners' vocabularies [dataset]. Kielipankki. Available: http://urn.fi/urn:nbn:fi:lb-2022041921.

Licensors:

Frankie R. Robertson

Distribution rights holders:

University of Helsinki

IPR Holder

Frankie R. Robertson

Contact Person

Frankie R. Robertson

text

Monolingual text corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Text Format

text/tsv

Size

12,000 Words

Character encoding

UTF - 8

Modalities

Written Language

Resource Creation

Resource Creator

Frankie R. Robertson

Metadata

Created: 04/07/2022

Last Updated: 06/17/2022

Metadata Creator

Mietta Lennes

Documentation

Resource group page: http://urn.fi/urn:nb...

Document Type: Other

Lisenssi: TallVocabL2Fi, License: TallVocabL2Fi, http://urn.fi/urn:nb... , 2022

Editor: FIN-CLARIN

Document Language: English

How to cite: https://www.kielipan...

People who looked at this resource also viewed the following: