Papers
Topics
Authors
Recent
Search
2000 character limit reached

Document Clustering using Sequential Information Bottleneck Method

Published 11 Apr 2010 in cs.IR | (1004.1796v1)

Abstract: This paper illustrates the Principal Direction Divisive Partitioning (PDDP) algorithm and describes its drawbacks and introduces a combinatorial framework of the Principal Direction Divisive Partitioning (PDDP) algorithm, then describes the simplified version of the EM algorithm called the spherical Gaussian EM (sGEM) algorithm and Information Bottleneck method (IB) is a technique for finding accuracy, complexity and time space. The PDDP algorithm recursively splits the data samples into two sub clusters using the hyper plane normal to the principal direction derived from the covariance matrix, which is the central logic of the algorithm. However, the PDDP algorithm can yield poor results, especially when clusters are not well separated from one another. To improve the quality of the clustering results problem, it is resolved by reallocating new cluster membership using the IB algorithm with different settings. IB Method gives accuracy but time consumption is more. Furthermore, based on the theoretical background of the sGEM algorithm and sequential Information Bottleneck method(sIB), it can be obvious to extend the framework to cover the problem of estimating the number of clusters using the Bayesian Information Criterion. Experimental results are given to show the effectiveness of the proposed algorithm with comparison to the existing algorithm.

Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.