wordvec – Word embeddings trained with word2vec

This resource collection contains word embeddings trained with word2vec from various corpora.

The embedding file is in a simple and easily parsed textual format produced by word2vec. The first line in the file gives the vocabulary size and dimension. Each line after that begins with a vocabulary item, followed by a space, followed by 128 floating point numbers (represented textually) each followed by a space.

Latest versions/subcorpora:
Word embeddings trained with word2vec from the Finnish Text Collection Metadata and license Attribution instructions	Download the resource
Word embeddings trained with word2vec from the Suomi24 corpus Metadata and license Attribution instructions	Download the resource
Search for all versions of this resource in META-SHARE

Of this language resource several versions are (or will be) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022041401

Search the Language Bank Portal:

Researcher of the Month: Harri Uusitalo

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information

wordvec – Word embeddings trained with word2vec

News

Contact