Natural Language Processing

http://cs.jhu.edu/~jason/465
).
Course catalog entry: This course is an indepth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers?
The course covers methods for trees (parsing and semantic interpretation), sequences (finitestate transduction such as tagging and morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, partofspeech tagging, speech recognition, and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226 or equivalent. [Applications, 4 credits]
Course objectives: Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 601.120) and use common data structures (601.226). It might also be nice—though it's not required—to have some previous familiarity with automata (600.271) and probabilities (601.475/675, 553.420/620, or 553.310/311). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
Lectures:  MWF 34 or 34:15, Mergenthaler 111. 
Recitations:  Tue 67:30, Shaffer 100. 
Prof:  Jason Eisner  () 
TAs:  ChuCheng Lin, Hongyuan Mei  
CAs:  Xiaochen Li, Ryan Newell, Lawrence WolfSonkin  
Office hrs: 
Prof: After class until 4:30; or by appt in Hackerman 324C Ryan: Mon 121 in Malone 239 Xiaochen: Tue 56 in Malone 239 Hongyuan: Wed 56 in Malone 239 ChuCheng: Thu 56 in Malone 239 Lawrence: Fri 1112 in Malone 239 
Discussion site: 
https://piazza.com/class/j70djk8t7um40l
... public questions, discussion, announcements 
Web page:  http://cs.jhu.edu/~jason/465 
Textbook: 
Jurafsky & Martin, 2nd ed. (semirequired  P98.J87 2009 in Science Ref section on CLevel) Roark & Sproat (recommended  P98.R63 2007 in same section) Manning & Schütze (recommended  free online PDF version here!) 
Policies: 
Grading: homework 50%, participation 5%, midterm 15%, final 30% Submission: TBA Lateness: floating late days policy Honesty: CS integrity code, JHU undergraduate policies, JHU graduate policies Intellectual engagement: much encouraged Disabilities: If you need accommodations for a disability, obtain a letter from Student Disability Services, 385 Garland, (410) 5164720. Announcements: Read mailing list and this page! 
This class is in the "flexible time slot" MWF 34:30. Please keep the entire slot open. Class will usually run 34, followed by office hours in the classroom from 44:30 (stick around to get your money's worth). However, class will sometimes run till 4:15 in order to keep up with the syllabus. I'll try to give advance notice of these "long classes," which among other things make up for noclass days when I'm out of town.
We also run a onceperweek recitation led by the prof or the TA. This session will focus on solving problems together. That's meant as an efficient and cooperative way to study for an hour: it reinforces the past week's class material without adding to your homework load. Also, if you come to discussion session as recommended, you won't be startled by the exam style — the discussion problems are taken from past exams and are generally interesting.
Warning: The schedule below may change. Links to future lectures and assignments may also change (they currently point to last year's versions).
Warning: Use the PPT slides if possible. The PDF export versions don't have animations and they may be out of date relative to the PPT files (although I do try to update them before exams).
Week  Monday  Wednesday  Friday  Suggested Reading  
8/28 
Class is on Thursday, not Wednesday as shown Introduction (ppt) 
Assignment 1 given: Designing CFGs Modeling grammaticality (ppt) 


9/4  No class (Labor Day) 
Language models
(ppt)

Probability concepts
(ppt; video lecture)



9/11 
Bayes' Theorem
(ppt) Smoothing ngrams (ppt) 
Assignment 2 given: Probabilities Smoothing continued 
Assignment 1 due (& another sign meant 3 ... ?) 


9/18 
Assignment 3 given: Language Models Contextfree parsing (ppt) 
Probably no class (Rosh Hashanah) 
Contextfree parsing



9/25 
Assignment 2 due Earley's algorithm (ppt) 
Extending CFG
(summary
(ppt))

Quick inclass quiz: Loglinear models Probabilistic parsing (ppt) 
 
10/2 
Assignment 4 given: Parsing Parsing tricks (ppt) 
Assignment 3 due Human sentence processing (ppt) 
Semantics
(ppt)



10/9 
Midterm exam (34:30 in classroom) 
Semantics continued

Assignment 5 given: Semantics Semantics continued 


10/16 
Learning in the limit
(ppt)

Forwardbackward algorithm (ppt)
(Excel spreadsheet; Viterbi version; lesson plan; video lecture)

No class (fall break) 


10/23 
Forwardbackward continued

Assignment 4 due Assignment 6 given: Hidden Markov Models Expectation Maximization (ppt) 
Finitestate algebra
(ppt)



10/30 
Finitestate machines

Finitestate implementation
(ppt)

Assignment 5 due Assignment 7 given: FiniteState Modeling Noisy channels and FSTs (ppt) 


11/6 
Noisychannel FSTs continued

Finitestate tagging
(ppt)

Programming with regexps
(ppt)



11/13 
Assignment 6 due 
Optimal paths in graphs

Structured prediction
(ppt)

 
11/20 
No class (Thanksgiving break) 
No class (Thanksgiving break) 
No class (Thanksgiving break) 

11/27 
Current NLP tasks and competitions
(ppt)

Applied NLP continued  Applied NLP continued 


12/4  Topic models (ppt)  Graphical models, deep learning, ... 
Assignment 7 due Machine translation Final exam: Wed 12/20, 9amnoon 
