

Title: Algorithmic Challenges in Machine Learning
Abstract:
In this talk, we address two algorithmic challenges in machine learning.
First, with the increase in electronic record-keeping, many datasets that learning algorithms work with relate to sensitive information about individuals. Thus the problem of privacy-preserving learning -- how to design learning algorithms that operate on the sensitive data of individuals while still guaranteeing the privacy of individuals in the training set -- has achieved great practical importance. In this talk, we address the problem of privacy-preserving classification, and we present an efficient classifier which is private in the differential privacy model of Dwork et al. Our classifier works in the ERM (empirical loss minimization) framework, and includes privacy preserving logistic regression and privacy preserving support vector machines. We show that our classifier is private, provide analytical bounds on the sample requirement of our classifier, and evaluate it on some real data.
Second, we address the problem of clustering, when data is available from multiple domains or views. For example, when clustering a document corpus such as Wikipedia, we have access to the contents of the documents and their link structure. In this talk, we address this problem of Multiview Clustering, and show how to use information from multiple views to improve clustering performance. We present an algorithm for multiview clustering, provide analytical bounds on the performance of our algorithm under certain statistical assumptions, and finally evaluate our algorithm on some real data.
Based on joint work with Sham Kakade (UPenn), Karen Livescu (TTI Chicago), Claire Monteleoni (CCLS Columbia), Anand Sarwate (ITA UCSD), and Karthik Sridharan (TTI Chicago).