[Search | Browse Authors | Browse Reports | Home ]

Data Clustering: A review

MSU-CSE-00-16

Anil K. Jain and M. N. Murthy and P. J. Flynn
August, 2000

Clustering is the unsupervised classification of patterns )observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combanatorially and differences in assumptions in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practioners. We present a taxonomy of clustering techniques and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.


Display BibTex Entry

The following online versions of this document are available.


You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format.


[Search | Browse Authors | Browse Reports | Home ]