Natural Language Processing

http://cs.jhu.edu/~jason/465
).
Course catalog entry: This course is an indepth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers?
The course covers methods for trees (parsing and semantic interpretation), sequences (finitestate transduction such as tagging and morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, partofspeech tagging, speech recognition, and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226 or equivalent. [Eisner, Applications, Fall] 3 credits
More information: Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 600.120) and use common data structures (600.226). It might also be nice to have some previous familiarity with automata (600.271) and probabilities (600.475, 550.420, or 550.310). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
Lectures:  MWF 34 or 34:15, Maryland 109.  
Prof:  Jason Eisner  ()  
TA:  Ryan Cotterell
 (ryan dot cotterell at gmail dot com )  
CA:  Kieran Magee
 (kbrantn1 at jhu dot edu )
 
Office hrs: 
For Prof: After class until 4:30, or by appt, in Hackerman 324C For TA/CA: TBA  
Discussion session:  TAled session (optional) for activities/discussion/questions/review: TBA  
Discussion site: 
http://piazza.com/jhu/fall2013/600465
... public questions, discussion, announcements  
Web page:  http://cs.jhu.edu/~jason/465  
Textbook: 
Jurafsky & Martin, 2nd ed. (semirequired  P98.J87 2009 in Science Ref section on CLevel) Roark & Sproat (recommended  P98.R63 2007 in same section) Manning & Schütze (recommended  free online PDF version here!)  
Policies: 
Grading: homework 50%, participation 5%, midterm 15%, final 30% Submission: TBA Lateness: floating late days policy Honesty: here's what it means Intellectual engagement: much encouraged Announcements: Read mailing list and this page!  
Related sites: 

This class is in the "flexible time slot" MWF 34:30. Please keep the entire slot open. Ordinarily we'll have lecture from 34 — followed by office hours from 44:30 in the classroom, for those of you who have questions or are interested in further discussion. However, from time to time, lecture will run till 4:15 in order to keep up with the syllabus. I'll give advance notice of these occasional "long lectures," which among other things make up for noclass days when I'll be out of town.
We'll also schedule a onceperweek discussion session led by your TA. This optional session will focus on solving problems together. That's meant as an efficient and cooperative way to study for an hour: it reinforces the past week's class material without adding to your homework load. Also, if you come to discussion session as recommended, you won't be startled by the exam style — the discussion problems are taken from past exams and are generally interesting.
Warning: The schedule below may change. Links to future lectures and assignments may also change (they currently point to last year's versions).
Warning: I sometimes turn off the PDF links when they are not up to date with the PPT links. If they don't work, just click on "ppt" instead.
Week  Monday  Wednesday  Friday  Suggested Reading  
9/2  No class (Labor Day)  No class (Rosh Hashanah) 
Introduction
(ppt)



9/9 
Assignment 1 given: Designing CFGs Chomsky hierarchy (ppt) 
Language models
(ppt)

Probability concepts
(ppt; video lecture)



9/16 
Bayes' Theorem
(ppt) Smoothing ngrams (ppt) 
(& another sign meant 3 ... ?) Assignment 2 given: Probabilities Limitations of CFG 
Improving CFG with attributes
(ppt)



9/23 
Assignment 3 given: Language Models Contextfree parsing (ppt) 
Assignment 2 due Contextfree parsing 
Earley's algorithm
(ppt)



9/30 
(not covered this year) Extending CFG (summary (ppt)) 
Probabilistic parsing
(ppt)

Assignment 3 due Assignment 4 given: Parsing Parsing tricks (ppt) 
 
10/7 
Catchup day (we'll be behind schedule by now) 
(not covered this year) Human sentence processing (ppt) 
No class (student NLP colloquium at UMBC) 


10/14 
(Monday 10/14 is fall break day;
but class meets on Tuesday 10/15,
which will follow a Monday schedule) Semantics (ppt) 
Semantics continued

Assignment 5 given: Semantics Semantics continued 


10/21 
Midterm exam (34:30 in classroom) 
Forwardbackward algorithm (ppt)
(Excel spreadsheet; Viterbi version; lesson plan; video lecture)

Forwardbackward continued

 
10/28 
Assignment 4 due Assignment 6 given: Hidden Markov Models Expectation Maximization (ppt) 
Finitestate algebra
(ppt)

Finitestate machines



11/4 
Finitestate implementation
(ppt)

Finitestate tagging
(ppt)

Assignment 5 due Noisy channels and FSTs (ppt) 


11/11 
More FST examples
(ppt)

Programming with regexps
(ppt)




11/18 
Assignment 6 due Assignment 7 given: FiniteState Modeling Optimal paths in graphs 
Structured prediction
(ppt)

Current NLP tasks and competitions
(ppt)



11/25  Applied NLP continued (ppt)  No class (Thanksgiving break) 
No class (Thanksgiving break) 

12/2  Applied NLP continued (ppt) 
Topic
models

Assignment 7 due Machine translation 


12/10  Thu 12/12 is absolute deadline for late assignments > 
Final exam: Tue 12/17, 9amnoon > 