Representation Learning

600.479/679 Fall 2014

Raman Arora

 

Course Description: Often the success of a machine learning project depends on the choice of features used. Machine learning has made great progress in training classification, regression and recognition systems when "good" representations, or features, of input data are available. However, much human effort is spent on designing good features which are usually knowledge-based and engineered by domain experts over years of trial and error. A natural question to ask then is "Can we automate the learning of useful features from raw data?" Representation learning algorithms such as principal component analysis aim at discovering better representations of inputs by learning transformations of data that disentangle factors of variation in data while retaining most of the information. The success of such data-driven approaches to feature learning depends not only on how much data we can process but also on how well the features that we learn correlate with the underlying unknown labels (semantic content in the data). This course will focus on scalable machine learning approaches for learning representations from large amounts of unlabeled, multi-modal, and heterogeneous data.


A tentative list of topics that we will cover:


1. Subspace learning [principal component analysis (PCA), sparse PCA, robust PCA, independent component analysis (ICA), stochastic optimization, stochastic approximation algorithms]

2. Manifold learning [kernel k-means, kernel PCA, similarity-based clustering, non-negative matrix factorization, incremental matrix factorization algorithms]

3. Deep learning [restricted Boltzmann machines, auto encoders, deep belief networks, convolutional neural networks, state of the art models in speech recognition and image classification]

4. Multi-view learning [partial least squares, canonical correlation analysis (CCA), kernel CCA, Deep CCA]

5. Spectral learning [spectral methods, spectral clustering, singular value decomposition (SVD), co-training, spectral learning of hidden Markov models (HMMs), tensor factorization, latent variable PCFGs, multivariate latent tree structures]   


Prerequisites: The class is accessible to undergraduates and graduates and only assumes background in basic machine learning or basic probability and linear algebra. Key mathematical concepts will be reviewed before they are used, but a certain level of mathematical maturity is expected.


Grading: Grades will be based on homework assignments (30%), class participation (10%), Project (35%) and an in-class final exam (25%).


Discussions: This term we will be using Piazza for class discussion. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. Find our class page here.



Instructor: Raman Arora

Time: TR (3:00PM-4:15PM)

Location: Shaffer Hal 301