Title: The Effectiveness of Lloyd-type Methods for the k-Means Problem Chaitanya Swamy Center for the Mathematics of Information California Institute of Technology ABSTRACT: We consider the k-means problem: given a set of n data points in a high dimensional Euclidean space, find k "centers" (points in the space) so as to minimize the sum of the squared distances from each data point to its nearest center. We investigate a widely-used iterative heuristic for this problem, called Lloyd's method (or Lloyd-type method), in an attempt to explain its popularity among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets, and show that a simple Lloyd-type method quickly leads to provably near-optimal solutions on well-clusterable instances. This is the first performance guarantee for any variant of Lloyd's heuristic. Furthermore, some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. The main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration, which leads to the desired performance guarantee. The talk will be self contained. This is joint work with Rafail Ostrovsky, Yuval Rabani and Leonard Schulman.