Cluster Based Mixed Coding Schemes for Inverted File Index Compression. Cluster Based Mixed Coding Schemes for Inverted File Index Compression.

Cluster Based Mixed Coding Schemes for Inverted File Index Compression‪.‬

Journal of Digital Information Management 2008, Feb, 6, 1

    • ‏5٫99 US$
    • ‏5٫99 US$

وصف الناشر

ABSTRACT: The cluster property of document collections in today's search engines provides valuable information for index compression. By clustering d-gaps of an inverted list based on a threshold, and then encoding clustered and non-clustered d-gaps using different methods, we can tailor to the specific properties of different d-gaps and achieve better compression ratio. Based on this idea, in this paper we propose a cluster based approach and presents two new codes for inverted file index compression: mixed gamma/flat binary code and mixed delta/flat binary code. Experiment results show that the two new codes achieve better or equal performance in terms of compression ratio comparing to interpolative code which is considered as the most efficient bitwise code at present. Besides, the two new codes have much lower complexity comparing to interpolative code and therefore enable faster encoding and decoding. By adjusting the parameters for the mixed codes, even better results may be achieved. Experiments show promising results with our approaches. Categories and Subject Descriptors

النوع
كمبيوتر وإنترنت
تاريخ النشر
٢٠٠٨
١ فبراير
اللغة
EN
الإنجليزية
عدد الصفحات
٣٣
الناشر
Digital Information Research Foundation
البائع
The Gale Group, Inc., a Delaware corporation and an affiliate of Cengage Learning, Inc.
الحجم
٢١٢٫٨
ك.ب.
Error Correction Coding Error Correction Coding
٢٠٢٠
Introduction to Data Compression Introduction to Data Compression
٢٠١٧
Coding for Data and Computer Communications Coding for Data and Computer Communications
٢٠٠٦
Algorithms and Architectures for Cryptography and Source Coding in Non-Volatile Flash Memories Algorithms and Architectures for Cryptography and Source Coding in Non-Volatile Flash Memories
٢٠٢١
Coding Theory and Applications Coding Theory and Applications
٢٠١٧
The Burrows-Wheeler Transform: The Burrows-Wheeler Transform:
٢٠٠٨
Semantic Notation and Retrieval in Art and Architecture Image Collections. Semantic Notation and Retrieval in Art and Architecture Image Collections.
٢٠٠٥
A Model to Predict Whether an Online RPG Makes Gamers Loyal. A Model to Predict Whether an Online RPG Makes Gamers Loyal.
٢٠٠٣
Collaborative Information Searching in an Information-Intensive Work Domain: Preliminary Results. Collaborative Information Searching in an Information-Intensive Work Domain: Preliminary Results.
٢٠٠٤
The City in Four Dimensions: The Nu.M.E. Project. The City in Four Dimensions: The Nu.M.E. Project.
٢٠٠٤
T-Stem--a Superior Stemmer and Temporal Extractor for Arabic Texts. T-Stem--a Superior Stemmer and Temporal Extractor for Arabic Texts.
٢٠٠٥
Citation Auctions As a Method to Improve Selection of Scientific Papers (Report) Citation Auctions As a Method to Improve Selection of Scientific Papers (Report)
٢٠٠٨