Introduction

What is machine learning?

--Toby Segaran, Programming Collective Intelligence

A more concise definition:

This course takes an application driven approach to current topics in machine learning. The course covers supervised learning (classification/structured prediction/regression/ranking), unsupervised learning (dimensionality reduction, bayesian modeling, clustering) and semi-supervised learning. Additional topics may include reinforcement learning and learning theory. The course will also consider challenges resulting from learning applications. We will cover popular algorithms (naive Bayes, SVM, perceptron, HMM, k-means, maximum entropy) and will focus on how statistical learning algorithms are applied to real world applications. Students in the course will implement several learning algorithms and develop a learning system for a final project.

Goals

This course should teach students to:

Requirements

Students are expected to have:

Grading

Homework

Since the focus of the course is on practical applications of machine learning, the bulk of the final grade comes from homework. Homeworks are comprised of both written problems and programming projects. Homeworks are to be turned in electronically. Instructions will be provided when homeworks are assigned. There will be about eight homeworks during the semester.

Late Policy

Late homework assignments will be accepted up to 24 hours past the due date for a 25% reduction in grade. Exceptions will only be given in extreme cases. However, every student is permitted to hand-in homeworks late penalty free using a 72-hour grace period for the entire semester. This means that you can choose to hand-in the first homework 70 hours late and the second homework 2 hours late, but then every other homework must be on time for the rest of the semester. You may divide these 72 hours as you see fit, but once you have used up all of the time, you will be given no more. I will round-up to the hour (minutes don't count.)

Final Project

A significant part of the final grade will be based on a project. Projects test the application of classroom through implementing a machine learning system or reviewing current research. There are two options for projects:

Each project will have three parts:

Textbook

The main textbook for this course is: Most readings will be drawn from the Bishop book. However, the course material is well covered in several other books. You are free to chose which book you prefer, but have access to the Bishop book for homeworks.

Available books include:

Cheating

I take cheating very seriously. I expect every student to have read the Department of Computer Science Academic Integrity Code and will hold students accountable to it. So that course policies are clear, here is review of relevant rules (in addition to the integrity code.)

I am aware that many of the programming assignments will ask students to implement algorithms already available online. I will try to avoid direct duplication when possible. However, you are not permitted to copy any part of your code from other libraries.

What happens when you cheat?

I will be carefully examining homeworks and exams for signs of cheating. If you cheat, at a minimum you will be given a 0 for the assignment or exam. More likely, you will have the total value of the homework or exame

Remember:

What is machine learning?

*"Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. What this means in most cases, is that an algorithm is given a set of data and infers information about the properties of the data--and that information allows it to make predictions about other data it might see in the future. This is possible because almost all nonrandom data contains patterns, and these patterns allow the machine to generalize. In order to generalize, it trains a model with what it determines are the important aspects of the data."*--Toby Segaran, Programming Collective Intelligence

A more concise definition:

*Machine learning allows computers to observe input and produce a desired output, either by example or through identifying latent patterns in the input.*This course takes an application driven approach to current topics in machine learning. The course covers supervised learning (classification/structured prediction/regression/ranking), unsupervised learning (dimensionality reduction, bayesian modeling, clustering) and semi-supervised learning. Additional topics may include reinforcement learning and learning theory. The course will also consider challenges resulting from learning applications. We will cover popular algorithms (naive Bayes, SVM, perceptron, HMM, k-means, maximum entropy) and will focus on how statistical learning algorithms are applied to real world applications. Students in the course will implement several learning algorithms and develop a learning system for a final project.

Goals

This course should teach students to:

- Evaluate a potential machine learning application and decide upon the best algorithm.
- Implement modern machine learning algorithms.
- Read and understand research papers from machine learning conferences (NIPS, ICML, UAI, etc.)

Requirements

Students are expected to have:

- Strong programming skills in Java. There will be considerable programming required for the homeworks.
- Comfort with basic mathematical skills (linear algebra, taking derivatives, etc.)

Grading

**Homeworks**: 75%**Project**: 25%

Homework

Since the focus of the course is on practical applications of machine learning, the bulk of the final grade comes from homework. Homeworks are comprised of both written problems and programming projects. Homeworks are to be turned in electronically. Instructions will be provided when homeworks are assigned. There will be about eight homeworks during the semester.

Late Policy

Late homework assignments will be accepted up to 24 hours past the due date for a 25% reduction in grade. Exceptions will only be given in extreme cases. However, every student is permitted to hand-in homeworks late penalty free using a 72-hour grace period for the entire semester. This means that you can choose to hand-in the first homework 70 hours late and the second homework 2 hours late, but then every other homework must be on time for the rest of the semester. You may divide these 72 hours as you see fit, but once you have used up all of the time, you will be given no more. I will round-up to the hour (minutes don't count.)

Final Project

A significant part of the final grade will be based on a project. Projects test the application of classroom through implementing a machine learning system or reviewing current research. There are two options for projects:

**Machine Learning system**: Students must implement a machine learning solution for an application of interest. This requires demonstrating knowledge learned in the class and not a black box application of a machine learning software package. A writeup describing the project is required. Projects of this type can be done by teams of 1-2 students.**Survey**: A tutorial that surveys the state of the art research in a particular area.

Each project will have three parts:

**Proposal**: A 1-2 page description of the proposed project. This will be due half way through the course.**Writeup**: The final writeup of the project. Length depends on the chosen project type.**Presentation**: An in-class presentation summarizing the main points of the project.

Textbook

The main textbook for this course is: Most readings will be drawn from the Bishop book. However, the course material is well covered in several other books. You are free to chose which book you prefer, but have access to the Bishop book for homeworks.

Available books include:

- Chris Bishop. Pattern Recognition and Machine Learning. 2006

This is an excellent book for beginners and assumes no prior knowledge of probability or statistics. Recommended for students with a general computer science background. - Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2009

This book covers a large amount of material and is ideal for those with some prior experience with statistics. - Tom Mitchell. Machine Learning. 1997

This book used to be the standard for machine learning, but is a bit out of date. It covers in depth some topics that more recent books overlook. - Ethem Alpaydin. Introduction to Machine Learning. 2004

Comparable in quality and coverage to the Bishop book, but not often used. A nice supplement for difficult topics.

Cheating

I take cheating very seriously. I expect every student to have read the Department of Computer Science Academic Integrity Code and will hold students accountable to it. So that course policies are clear, here is review of relevant rules (in addition to the integrity code.)

- Every exam, project, homework and any other work completed during this course must be entirely your own. Copying any material from other students or the web is expressly prohibited.
- All exams are closed book unless otherwise stated. This means that students may not reference any material during an exam that is not provided as part of the exam.
- Any collaboration between students during an exam will be considered cheating.
- If a student copies your work, even without your knowledge, you are cheating. It is your responsability to ensure that no one has access to your work.
- There is no statue of limitations on punishing cheating. Even if I find on the last day of the semester that you had cheated on homeworks, you will be punished.
- Talking with other students to understand homework and course mateiral is strongly encouraged. However, discussing an assignment and cheating are very different things. If you copy someone else's work you are cheating. If you let someone copy your work you are cheating. If someone tells you the answer you are cheating. Everything you hand in must be in your own words based on your understanding of the solution.

**Cheating**- Copying any part of a homework from someone else.
- Verbally telling someone the answer to a homework question.
- Looking at someone else's code or solution.
- Obtaining any part of your solution or code from any online resource or software library.

**Not-Cheating**- Explaining the homework question to someone else.
- Disucss at a high level the homework.
- Helping someone think through a problem.
- Directing someone to a section of the textbook, reading, or online resource that helps explain a concept.

I am aware that many of the programming assignments will ask students to implement algorithms already available online. I will try to avoid direct duplication when possible. However, you are not permitted to copy any part of your code from other libraries.

What happens when you cheat?

I will be carefully examining homeworks and exams for signs of cheating. If you cheat, at a minimum you will be given a 0 for the assignment or exam. More likely, you will have the total value of the homework or exame

*subtracted*from your grade, ie. if you cheat on an exam worth 15% of your grade, you will get a 0 on the exam and have an additional 15% of your grade deducted. In some cases, cheating will be reported to the appropriate university board, which can result in failing the class, suspension of expulsion.Remember:

**DO**help each other understand the lectures, readings and homeworks.**DO NOT**complete each other's homework.