Hybrid Hierarchical Clustering: An Experimental Anallysis

Keerthiram Murugesan, and Jun Zhang
Laboratory for High Performance Scientific Computing and Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506-0046, USA

Abstract

In this paper, we present a hybrid clustering method that combines the divisive hierarchical clustering with the agglomerative hierarchical clustering. We used the bisect K-means divisive clustering algorithm in our method. First, we cluster the document collection using bisect K-means clustering algorithm with K' > K as the total number of clusters. Second, we calculate the centroids of K' clusters obtained from the previous step. Then we apply the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) agglomerative hierarchical algorithm on these centroids for the given K. After the UPGMA nds K clusters in these K' centroids, if two centroids ended up in the same cluster, then all of their documents will belong to the same cluster. We compared the goodness of clusters generated by the standard bi- sect K-means algorithm and the proposed hybrid algorithm, measured on various cluster evaluation metrics. Our experimental results show that the proposed method outperforms the standard bisect K-means algorithm. At the end, we show that picking a value for K' has no major impact in the end results by analyzing the relation between the value of K' and the quality of the clusters.


Key words: Clustering, Hybrid clustering, Hierarchical clustering

Mathematics Subject Classification:


Download the the PDF file keer11.pdf.
Technical Report CMIDA-HiPSCCS 001-11, Department of Computer Science, University of Kentucky, Lexington, KY, 2011.