|
|
||||
|
|
|
|
|
|
Natural Language Processing
|
|
http://cs.jhu.edu/~jason/465).Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 600.120) and use common data structures (600.226). It might also be nice to have some previous familiarity with automata (600.271) and probabilities (550.310). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
Course catalog entry: This course is an in-depth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers? The course covers methods for trees (parsing and semantic interpretation), sequences (finite-state transduction such as morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, part-of-speech tagging, speech recognition and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226. [Eisner, Applications, Fall] 3 credits
| Lectures: | MWF 3-4 pm (sometimes 3-4:30), CSEB B17 | |
| Prof: | Jason Eisner - ( ) | |
| TA: | Jason Smith - ( ) | |
| CA: | Eric Kim - | |
| Office hrs: |
For Prof: MW 4pm after class, or by appt in CSEB 324C For TA: M 11-12:30 in CSEB 323, or by appt | |
| Mailing list: |
probably
... public questions, discussion, announcements | |
| Web page: | http://cs.jhu.edu/~jason/465 | |
| Textbook: |
Jurafsky &
Martin, 2nd ed. (semi-required - P98.J87 2009 in Science Ref section on C-Level) Manning & Schütze (recommended - online PDF version is accessible for free from within JHU) | |
| Policies: |
Grading: homework 45%, participation 10%, midterm 15%, final 30% Submission: via this web form Lateness: floating late days policy Honesty: here's what it means Intellectual engagement: much encouraged Announcements: Read mailing list and this page! | |
| Related course sites: |
|
This class is in the 3-4:30 time slot, a "flex slot" that is intended to permit either three 50-minute lectures or two 75-minute lectures per week. Usually, we have three 50-minute lectures per week (MWF). However, you will see on the schedule below that I am fairly likely to cancel class on M 9/29, M 10/27, W 11/19, and M 12/8. I may therefore ask you to let me make up the time with a few 75-minute lectures, for example, on M and W in the week before a canceled class.
Warning: For future lectures and assignments, the links below take you to last year's versions, which are subject to change.
Warning: The Jurafsky & Martin chapter numbers refer to the 1st edition, not the 2nd. We will update them shortly.
| Week | Monday | Wednesday | Friday | Suggested Reading | |
| 9/1 |
Introduction
(ppt)
|
J&M chapter 1 | |||
| 9/8 |
Assignment 1 given: Designing CFGs Chomsky hierarchy (ppt) |
Language models
(ppt)
|
Probability concepts
(ppt)
Bayes' Theorem (ppt) |
J&M chapters 13, 6.2 (see also M's slides); for assignment, J&M 9 (or M&S 3) |
|
| 9/15 |
Smoothing n-grams
(ppt)
|
(postponed till 10/3 or 10/10) Human sentence processing (ppt) |
(& another sign meant 3 ... ?) Assignment 2 given: Using n-Grams Limitations of CFG |
M&S chapters 2, 6; Rosenfeld (2000) survey | |
| 9/22 |
Improving CFG with features
(ppt)
|
Context-free parsing
(ppt)
|
Context-free parsing
|
J&M 10, 11.1-11.4 | |
| 9/29 | No class (Rosh Hashanah) But could have a Q&A / homework help session with the TA ... |
Earley's algorithm
(ppt)
|
Extending CFG
(summary
(ppt))
|
J&M 10 | |
| 10/6 |
Probabilistic parsing
(ppt)
|
Parsing tricks
(ppt)
A song about parsing |
Assignment 2 due
Assignment 3 given: Parsing Learning is impossible? (ppt) |
J&M 12 (or M&S 11.1-11.3 and 12.1.1-12.1.5) | |
| 10/13 | No class (fall break) |
Semantics
(ppt)
|
Semantics continued
|
J&M 14-15; also this web page, up to but not including "denotational semantics" section; and you could try the Penn Lambda Calculator; and how about lambda calculus for kids? | |
| 10/20 |
Assignment 4 given: Semantics Finite-state functions (ppt) |
Finite-state implementation
(ppt)
|
Midterm exam (in class) |
chap 2 of xfst book draft (only accessible from barley and other Solaris machines at JHU CS; don't distribute) | |
| 10/27 | No class (prof. traveling) |
Programming with Regexps
(ppt)
Noisy Channels and FSTs (ppt)
Assignment 3 due
Morphology and Phonology
(ppt)
|
chap 3 of xfst book draft;
perhaps also this paper |
| |
| 11/3 |
Finite-state parsing
|
Finite-state tagging
(ppt)
|
HMMs
|
J&M 8 or M&S 10 | |
| 11/10 |
Assignment 4 due Assignment 5 given: Finite-State Grammars Forward-backward algorithm (Excel spreadsheet; Viterbi version; lesson plan) |
Forward-backward continued
|
Expectation Maximization
(ppt)
|
J&M chapter 6 (2nd ed.) or perhaps Allen pp. 195-208 (handout); M&S 11 | |
| 11/17 |
Final FSM Examples
(ppt)
|
No class?? (day before Thanksgiving) Assignment 5 due Assignment 6 given: Training an HMM |
No class (Thanksgiving break) |
M&S 14 | |
| 11/24 |
Grouping words
(ppt; Excel spreadsheet)
|
Splitting words
(ppt)
|
Words vs. senses in IR
(ppt)
|
M&S 7, 5, 15.2, 15.4 (since J&M 16-17 covers only some of this) | |
| 12/1 |
Machine Translation
|
(may replace this & next lecture with more general overview of machine learning in NLP) Text categorization (ppt) |
Maximum entropy (ppt) |
Kevin Knight's great MT tutorial and workbook; M&S 13, 16 |
|
| 12/8 |
No class?? (prof traveling) Assignment 6 due Current and Future Research (ppt) |
Sun 12/14 is absolute deadline for late assignments ---> |
Final exam: Thu 12/18, 9am-noon ---> |