Sanjoy Dasgupta UCSD Title: Active learning of linear separators Abstract: In the well-studied setting of "supervised learning", the goal is to learn a classifier from *labeled* data: for instance, given a collection of credit card transactions which have been labeled as legitimate or fraudulent, to learn a rule which will accurately classify future transactions. The problem with this model is that often, *unlabeled* data is easy to come by but labels are expensive. For instance, if you're building a speech recognizer, it's easy enough to get raw speech samples -- just walk around with a microphone -- but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of "active learning", the goal is as always to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in the regular supervised learning framework (that is, when all points automatically come labeled). I'll give detailed background on active learning, and then show that for a wide range of data distributions, it is possible to learn linear separators using exponentially fewer labels than would be needed in supervised learning.