Representation Learning

600.479/679 Fall 2015

Raman Arora


Course Description: Unsupervised learning of useful features, or representations, is one of the most basic challenges of machine learning. Too often the success of a data science project depends on the choice of features used. Machine learning has made great progress in training classification, regression and recognition systems when “good” representations, or features, of input data are available. However, much human effort is spent on designing good features which are usually knowledge-based and engineered by domain-experts over years of trial and error. A natural question to ask then is “Can we automate the learning of useful features from raw data?”. Unsupervised representation learning techniques capitalize on unlabeled data which is often cheap and abundant and sometimes virtually unlimited. The goal of these ubiquitous techniques is to learn a representation that reveals intrinsic low-dimensional structure in data, disentangles underlying factors of variation by incorporating universal AI priors such as smoothness and sparsity, and is useful across multiple tasks and domains. This course will focus on theory and methods for representation learning that can easily scale to large amounts of unlabeled, multi-modal, and heterogeneous data.

A tentative list of topics that we will cover:

1. Subspace learning [principal component analysis (PCA), sparse PCA, robust PCA, independent component analysis (ICA), stochastic optimization, stochastic approximation algorithms]

2. Manifold learning [kernel k-means, kernel PCA, similarity-based clustering, non-negative matrix factorization, incremental matrix factorization algorithms]

3. Deep learning [restricted Boltzmann machines, auto encoders, deep belief networks, convolutional neural networks, state of the art models in speech recognition and image classification]

4. Multi-view learning [partial least squares, canonical correlation analysis (CCA), kernel CCA, Deep CCA]

5. Spectral learning [spectral methods, spectral clustering, singular value decomposition (SVD), co-training, spectral learning of hidden Markov models (HMMs), tensor factorization, latent variable PCFGs, multivariate latent tree structures]   

Prerequisites: The class is accessible to undergraduates and graduates and only assumes background in basic machine learning or basic probability and linear algebra. Key mathematical concepts will be reviewed before they are used, but a certain level of mathematical maturity is expected.

Grading: Grades will be based on homework assignments (30%), class participation (10%), Project (35%) and an in-class midterm exam (25%).

Discussions: This term we will be using Piazza for class discussion. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. Find our class page here.

Instructor: Raman Arora

Time: TR (3:00PM-4:15PM)

Location: Hodson Hall 110