
Simple and Efficient Projective Clustering
Clark F. Olson and Henry J. Lyons In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, pages 4555, October 2010. Download (1145 K) Keywords: Subspace clustering, projected clustering We describe a new Monte Carlo algorithm for projective clustering that is both simple and efficient. Like previous Monte Carlo algorithms, we perform trials that sample a small subset of the data points to determine the dimensions in which the points are sufficiently close to form a cluster and then search the rest of the data for data points that are part of the cluster. However, our algorithm differs from previous algorithms in the method by which the dimensions of the cluster are determined and the method for determining the points in the cluster. This allows us to use smaller subsets of the data to determine the cluster dimensions and achieve improved efficiency over previous algorithms. The complexity of our algorithm is O(nd^{1+log alpha/log beta}), where n is the number of data points, d is the number of dimensions in the space, and alpha and beta are parameters that specify which clusters should be found. To our knowledge, this is the lowest published complexity for an algorithm that is able to place high bounds on the probability of success. We present experiments that show that our algorithm outperforms previous algorithms on real and synthetic data. 