Finnish Tagtools 1.5

finnish-tagtools

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2021042101

Access location:

This software package provides finnish-postag, a part-of-speech and morphology tagger for Finnish, and finnish-nertag, a named entity recogniser for Finnish.
This software is also installed in CSC's computing environment (module load finnish-tagtools).

Both tools take running text from standard input and produce tabular output (one token per line) to standard output. See --help messages for more details.

An installer is provided in the form of a Makefile. More information can be found in the README-file in the download folder.

ChangeLog:

1.1 Initial release

1.2: Intermediate version (not published at the Language Bank)

1.3

General
- finnish-nertag and finnish-postag tokenize identically; tokenization no longer allows multi-word tokens
- fixed tokenization-related bugs
- added new version of OMorFi
- fixed several glaring FinnPOS-related bugs and improved POS tagging and lemmatization
- reduced the size of omorfi_tokenize.pmatch and ftb.omorfi.model

finnish-nertag:
- implemented nested annotations
- added options --no-tokenize, --show-analyses and --show-nested
- more reliable and extensive lemma normalization with normalize-lemmas.py
- Capture() memory is wiped at XML closing tags such as </text>, </body> etc.
- FiNER rules:
- added sub category EnamexPrsAnm (animals)
- restored EnamexPrsTit (titles)
- restored and expanded numerical expressions (NumexMsrXxx, NumexMsrCur)
- rewrote and expanded EnamexProXxx rules to include foods and cultivars
- TimexTmeDat: years that are divisble by 10 are now recognized more reliably
- greatly improved recall and precision
- fixed the all-caps bug: consecutive all-caps input strings no longer cause finnish-nertag to slow down or freeze

1.3.1 Maintenance update
- Uses now natively compiled hfst-pmatch if found in path (The pre compiled version can be slow)
- added --no-tokenize option to finnish-postag
1.3.2 Bugfix update
- Fixes bug in finnish-tokenize

1.4.0
- Updated NER rules and tests for them

1.5.0
- bio-tagging added
- new FinnPos model for disambiguating morphology

1.5.1
- tests fixed according to changes in disambiguating morphology

You don’t have the permission to edit this resource.