Semi-Automated Identification of Faceted Categories from Large Corpora. Semi-Automated Identification of Faceted Categories from Large Corpora.

Semi-Automated Identification of Faceted Categories from Large Corpora‪.‬

Academy of Information and Management Sciences Journal 2009, Jan-July, 12, 1-2

    • ‏5٫99 US$
    • ‏5٫99 US$

وصف الناشر

INTRODUCTION This paper describes FFID (Fast Facet Identifier), a system that can be used to compute facets from a corpus of documents. FFID uses a fast simplified clustering algorithm that allows the identification of hundreds of facet clusters from a corpus of hundreds of thousands of sentences in a very short time (seconds). The automatic identification of facets may be a very powerful tool to design better information retrieval systems. The goal of information retrieval is to support people in searching for the information they need. Given an information problem, finding relevant (let alone high quality) documents is difficult. The sheer amount of information available on line makes this a difficult problem. The size of the web is debatable (Markoff, 2005) but it must be by now at least 12,000 million pages. If each one of these web pages were printed on a standard A4 sheet of paper (21-cm wide), and put side to side on a straight line, it would take about 60 earth circumferences to lay them all down. This is a lot of information. People learn about their information problem and about the information resource they are using through interaction with the resource. Human computer interaction is the crucial phenomenon of the information retrieval process. Fast algorithms, hardware for storage and processing, data and knowledge structures are important but useless if we do not understand how humans interact with machines when looking for information. All the techniques we use must first take into account what we are doing this for: the user. Users encounter several problems when they approach an information resource:

النوع
كمبيوتر وإنترنت
تاريخ النشر
٢٠٠٩
١ يناير
اللغة
EN
الإنجليزية
عدد الصفحات
٣٢
الناشر
The DreamCatchers Group, LLC
البائع
The Gale Group, Inc., a Delaware corporation and an affiliate of Cengage Learning, Inc.
الحجم
٢١٨٫٧
ك.ب.
Ontology Learning and Population from Text Ontology Learning and Population from Text
٢٠٠٦
Computational Linguistics and Intelligent Text Processing Computational Linguistics and Intelligent Text Processing
٢٠٠٩
Information Retrieval Technology Information Retrieval Technology
٢٠٠٨
Advances in Natural Language Processing Advances in Natural Language Processing
٢٠٠٨
Cognitive Approach to Natural Language Processing Cognitive Approach to Natural Language Processing
٢٠١٧
Computational Linguistics and Intelligent Text Processing Computational Linguistics and Intelligent Text Processing
٢٠٢٣
Employee Performance Evaluation Using the Analytic Hierarchy Process (Manuscripts) Employee Performance Evaluation Using the Analytic Hierarchy Process (Manuscripts)
٢٠٠٣
Heuristics for Scheduling Operations in MRP: Flowshop Case (Material Requirements Planning) Heuristics for Scheduling Operations in MRP: Flowshop Case (Material Requirements Planning)
٢٠٠٦
The Ebay Factor: The Online Auction Solution to the Riddle of Reverse Logistics (Manuscripts) The Ebay Factor: The Online Auction Solution to the Riddle of Reverse Logistics (Manuscripts)
٢٠٠٥
Six Sigma and Innovation (Manuscripts) Six Sigma and Innovation (Manuscripts)
٢٠٠٣
E-Commerce Security Standards and Loopholes (Manuscripts) E-Commerce Security Standards and Loopholes (Manuscripts)
٢٠٠٠
Customer Relationship Management Strategies for the Internet (Company Overview) Customer Relationship Management Strategies for the Internet (Company Overview)
٢٠٠١