Oulu Corpus

446 Last view: 2024-03-18

30 Last update: 2023-05-10

View resource name in all available languages

Oulun korpus

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-201403266

The resource is available in Kielipankki - the Language Bank of Finland via the computing environment (puhti.csc.fi). Researchers may apply for individual access rights for research use, see instructions at https://www.kielipankki.fi/support/access/. The purpose of the resource use must be outlined in a research plan.

The Oulu Corpus is a research material of Standard Finnish in the 1960's. The original material was collected by a group led by prof. Pauli Saukkonen at the University of Oulu. The original corpus project aimed to collect a representative sample of Standard Finnish language in the 1960's media in order to create a frequency dictionary of Finnish. The annotated text material was converted into SGML format by the Institute for the Languages of Finland in 1997.

The corpus is divided into the following sections:

- fiction literature published in 1961 - 1967
- radio talks during 29 September 1968 - 26 May 1969
- newspapers and journals in 1967
- non-fiction literature 1961 - 1967

The corpus also contains some transcripts of interviews in Finnish. The corpus does not include language as used in television.

Each section contains further subdivisions. The text samples in each genre were selected randomly.

The content was grammatically annotated using CQP (Corpus Query Processor).

Change log:
26.11.2018 the link http://islrn.org/resources/875-594-014-737-2 was removed.

View resource description in all available languages

Aineisto on saatavilla Kielipankin laskentaympäristössä (puhti.csc.fi). Tutkijoille voidaan hakemuksesta myöntää aineiston henkilökohtainen käyttöoikeus tutkimustarkoituksiin. Käyttötarkoitus tulee kuvata tutkimussuunnitelmassa, ks. https://www.kielipankki.fi/kayttajaksi/.

Oulun korpus on 1960-luvun suomen yleiskielen sähköinen tutkimusaineisto, joka on muunnettu SGML-muotoon Kotimaisten kielten keskuksessa vuonna 1997. Alkuperäinen analysoitu tekstimateriaali koottiin suomen kielen taajuussanastoa varten Oulussa prof. Pauli Saukkosen johtamassa ryhmässä.

Aineisto sisältää 5800 näytettä, jotka sisältävät yhteensä 429058 sanaa ja noin 29000 lausetta.

Korpus sisältää seuraavat osiot:

- kaunokirjallisuutta, julkaisuvuosilta 1961 – 1967
- radiopuhetta tekstinä jaksolta 29.9.1968 – 26.5.1969
- sanoma- ja aikakauslehtiä vuodelta 1967
- tietokirjallisuutta vuosilta 1961 – 1967

Lisäksi korpus sisältää joitakin suomenkielisiä haastatteluja. Aineistoon ei sisälly televisiossa käytettyä kieltä.

Osioiden sisällä on tarkempi jaottelu. Kunkin osion tekstiotokset valittiin satunnaisesti.

Aineiston annotointi on tehty CQP:n (Corpus Query Processor) avulla.

Change log:
26.11.2018 the link http://islrn.org/resources/875-594-014-737-2 was removed.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CLARIN RES

Restrictions: Academic - Non Commercial Use, Attribution, No Redistribution, Other

Distribution Access/Medium: Accessible Through Interface

Attribution Details: See Documentation section.

Licensors:

Kotimaisten kielten keskus, Institute for the Languages of Finland

University of Oulu

Distribution rights holders:

CSC - Tieteen tietotekniikan keskus Oy , CSC — IT Center for Science Ltd

University of Helsinki

IPR Holder

Kotimaisten kielten keskus, Institute for the Languages of Finland

University of Oulu

Contact Person

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland

text

Monolingual text corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

29,000 Sentences

429,058 Words

Modalities

Written Language

Metadata

Created: 06/15/2012

Last Updated: 05/10/2023

Metadata Language: English, Finnish (en, fi)

Revision: link to English group page added

Metadata Creator

Usage

Foreseen UseHuman Use

Use NLP Specific: Linguistic Research

Relation

Related Resource: Oulu Corpus, Helsinki Korp Version http://urn.fi/urn:nb...

Relation Type: IsOriginalFormOf

Documentation

Pauli Saukkonen. 1982. Oulun korpus. 1960-luvun suomen yleiskielen tutkimusmateriaali. Oulun yliopiston suomen ja saamen kielen laitoksen tutkimusraportteja 1. Oulu.

Paulu Saukkonen, Marjatta Haipus, Antero Niemikorpi and Helena Sulkala. 1979. Suomen kielen taajuussanasto. WSOY.

Aineistoryhmän sivu: http://urn.fi/urn:nb...

License: http://urn.fi/urn:nb...

Resource group page: http://urn.fi/urn:nb...

Document Type: Other

Attribution Details, https://www.kielipan...

People who looked at this resource also viewed the following: