Optimization, Computational Geometry, Graphs and High-Dimensional Data Jose Costa Center for the Mathematics of Information California Institute of Technology ABSTRACT: Increasingly intricate and rich data sets at the heart of today's most common applications, from video surveillance to biomedical information systems, are raising new questions in data storage, access and especially data exploration. The high-dimensional nature of such data sets makes them prone to the curse of dimensionality, with the consequent failure of standard inference tools. This series of two lectures will address some challenging problems in analyzing high-dimensional data and information that must be transmitted, stored and processed accurately using manageable computational resources. Originally motivated by computational considerations, we demonstrate how computational efficient and scalable graph constructions can be used to encode both statistical and spatial information and address the problems of dimension reduction and structure discovery in high-dimensional data, with provable results. In this first lecture, we will discuss the asymptotic behavior of graphs such as Minimal Spanning Trees or k-Nearest Neighbor graphs, and show how this can be used to estimate the intrinsic dimension and entropy of data sets that span a high-dimensional space but contain fundamental features concentrated on a low-dimensional manifold.