Morphosyntactic Tagging of Slovene Legal Language.
Informatica 2006, Dec, 30, 4
-
- 79,00 Kč
-
- 79,00 Kč
Publisher Description
Part-of-speech tagging or, more accurately, morphosyntactic tagging, is a procedure that assigns to each word token appearing in a text its morphosyntactic description, e.g. "masculine singular common noun in the genitive case". Morphosyntactic tagging is an important component of many language technology applications, such as machine translation, speech synthesis, or information extraction. In the paper we report on an experiment on morphosyntactic tagging of Slovene, on a sample of Slovene legal language. We evaluate the accuracy of the TnT tagger, which had been trained on the MULTEXT-East language resources for Slovene. The test data come from the freely available parallel English-Slovene corpus SVEZ-IJS, which contains the Slovene translation European Union legal acts. Presented are the details of the manually corrected test corpus and an analysis of the tagging errors. The paper also discusses a simple transformation-based program that fixes some of the more common errors, and concludes with some directions for future work. Povzetek: V prispevku je opisan poskus oblikoslovnega oznacevanja na vzorcu slovenskih pravnih besedil.