Finnish OpenSubtitles 2017, VRT

View resource name in all available languages

Suomenkielinen OpenSubtitles 2017, VRT

opensub-fi-2017-vrt

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2021081701

Access location:

The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the [OPUS OpenSubtitles2018](http://opus.nlpl.eu/OpenSubtitles2018.php) multilingual corpus. Information on the material processing up to sentence splitting can be found in the original publication Lison & Tiedemann (2016). The corpus has been tokenized and annotated with morpho-syntactic analysis produced with the [Turku Dependency Parser](http://turkunlp.github.io/Finnish-dep-parser/).

The corpus has been divided into 65 separate VRT files so that each file contains 1 000 text elements (excluding the last one).

P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)

License: CC BY https://creativecommons.org/licenses/by/4.0/

View resource description in all available languages

Aineisto kattaa Opensubtitles.org sivuston jakamat elokuvien ja tv-ohjelmien suomenkieliset tekstitykset. Aineisto on johdannainen monikielisestä Opensubtitles2018 korpuksesta (http://opus.nlpl.eu/OpenSubtitles2018.php). Aineisto on jaettu lähteisiin ja lauseisiin. Lauseet on morfosyntaktisesti jäsennetty käyttäen Turku BioNLP -ryhmän kehittämää dependenssijäsennintä (http://turkunlp.github.io/Finnish-dep-parser/).

Aineisto on jaettu 65 erilliseen VRT-tiedostoon niin että kussakin tiedostossa on 1 000 text-elementtiä (paitsi viimeisessä).

Lisenssi: CC BY https://creativecommons.org/licenses/by/4.0/

You don’t have the permission to edit this resource.