Named Entity Recognition Using Appropriate Unlabeled Data, Post-Processing and Voting (Technical Report) Named Entity Recognition Using Appropriate Unlabeled Data, Post-Processing and Voting (Technical Report)

Named Entity Recognition Using Appropriate Unlabeled Data, Post-Processing and Voting (Technical Report‪)‬

Informatica 2010, March, 34, 1

    • $5.99
    • $5.99

Publisher Description

This paper reports how the appropriate unlabeled data, post-processing and voting can be effective to improve the performance of a Named Entity Recognition (NER) system. The proposed method is based on a combination of the following classifiers: Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). The training set consists of approximately 272K wordforms. The proposed method is tested with Bengali. A semi-supervised learning technique has been developed that uses the unlabeled data during training of the system. We have shown that simply relying upon the use of large corpora during training for performance improvement is not in itself sufficient. We describe the measures to automatically select effective documents and sentences from the unlabeled data. In addition, we have used a number of techniques to post-process the output of each of the models in order to improve the performance. Finally, we have applied weighted voting approach to combine the models. Experimental results show the effectiveness of the proposed approach with the overall average recall, precision, and f-score values of 93.79%, 91.34%, and 92.55%, respectively, which shows an improvement of 19.4% in f-score over the least performing baseline ME based system and an improvement of 15.19% in f-score over the best performing baseline SVM based system. Povzetek: Razvita je metoda za prepoznavanje imen, ki temelji na utezenem glasovanju vec klasifikatorjev.

GENRE
Business & Personal Finance
RELEASED
2010
March 1
LANGUAGE
EN
English
LENGTH
65
Pages
PUBLISHER
Slovenian Society Informatika
SELLER
The Gale Group, Inc., a Delaware corporation and an affiliate of Cengage Learning, Inc.
SIZE
370.2
KB

More Books Like This

Feature Engineering for Machine Learning and Data Analytics Feature Engineering for Machine Learning and Data Analytics
2018
Practical Text Analytics Practical Text Analytics
2018
Improving Part-Of-Speech Tagging Accuracy for Croatian by Morphological Analysis (Report) Improving Part-Of-Speech Tagging Accuracy for Croatian by Morphological Analysis (Report)
2009
Efficient Morphological Parsing with a Weighted Finite State Transducer (Report) Efficient Morphological Parsing with a Weighted Finite State Transducer (Report)
2009
Future Data and Security Engineering Future Data and Security Engineering
2015
Real World Data Mining Applications Real World Data Mining Applications
2014

More Books by Informatica

A Performance Evaluation of Distributed Algorithms on Shared Memory and Message Passing Middleware Platforms (Javaspaces, CORBA) A Performance Evaluation of Distributed Algorithms on Shared Memory and Message Passing Middleware Platforms (Javaspaces, CORBA)
2005
Tuning Chess Evaluation Function Parameters Using Differential Evolution Algorithm (Report) Tuning Chess Evaluation Function Parameters Using Differential Evolution Algorithm (Report)
2011
Theory of K-Representations As a Comprehensive Formal Framework for Developing a Multilingual Semantic Web (Report) Theory of K-Representations As a Comprehensive Formal Framework for Developing a Multilingual Semantic Web (Report)
2010
On the Crossing Number of Almost Planar Graphs. On the Crossing Number of Almost Planar Graphs.
2006
Efficient Morphological Parsing with a Weighted Finite State Transducer (Report) Efficient Morphological Parsing with a Weighted Finite State Transducer (Report)
2009
The Modelling of Manpower by Markov Chains--a Case Study of the Slovenian Armed Forces (Report) The Modelling of Manpower by Markov Chains--a Case Study of the Slovenian Armed Forces (Report)
2008