600.465 Syllabus: Introduction to NLP (Fall 2000)

Where: Shaffer 300 (MTW)
When: MTW 2-3
(Other CS Dept. courses)

Instructor: Jan Hajic TA: Gideon S. Mann
Email: hajic@cs.jhu.edu Email: gsm@cs.jhu.edu
WWW: http://www.cs.jhu.edu/~hajic WWW: http://www.cs.jhu.edu/~gsm
Office: New Engineering Building 326 Office: NLP Lab "North", NEB 329
Phone: (410) 516-8438 Phone: (410) 516-7052
Office hours: Mon 3-4, Tue 3-4 and by appointment Office hours: TBA


New on this course's web pages:



 

Prerequisites & Relation to Other Courses:

Students should have a substantial programming experience in either C, C++, Java and/or Perl, and have taken 600.226 (Data Structures) or its equivalent. Knowledge of, or willingness to learn the basics of Perl as-you-go (and on your own) is also important.

The material covered in this course is selected in such a way that at its completion you should be able to understand papers in the field of Natural Language Processing, and it should also make your life easier when taking 600.466, 600.666 and eventually 520.779 (although it is not a prerequisite for them).

Please note that there is a NEW ADVANCED COURSE in the NLP area, by Jason Eisner, on Finite-state Methods in Natural Language Processing (600.405, short course, 1 credit). Please see the department's course schedule for more.

No background in NLP is necessary.

Readings:


Assignments & Due Dates:

Turning in the Assignments

Policies

The Assignments

No. Due date Task Resources
#1
Oct 02 Exploring Entropy and Language Modeling hops:~hajic/cs465/TEXT{EN,CZ}1.txt
#2
Oct 25 Word Classes hops:~hajic/cs465/TEXT{EN,CZ}1.{txt,ptg}
#3
Nov 29 Tagging hops:~hajic/cs465/text{en,cz}2.ptg
#4
Dec 11 baseNP chunking  
Open to submissions Tentative Closed to submissions

Additional resources:


Tentative Course Schedule:

Week 1
INTRODUCTION; BASIC PROBABILITY & INFORMATION THEORY
M 09/11 Introduction, Organization, Homeworks. Course Overview: Intro to NLP. Main Issues.
T 09/12 The Very Basics on Probability Theory.
W 09/13 Elements of Information Theory I.
Week 2
LANGUAGE MODELING
M 09/18 Elements of Information Theory II.
T 09/19 Language Modeling in General and the Noisy Channel Model.
W 09/20 Class canceled.
Week 3
LANGUAGE MODELING
M 09/25 Language Modeling in General and the Noisy Channel Model (cont'd).
T 09/26 Smoothing and the EM algorithm.
W 09/27 Class clanceled.
Week 4
LINGUISTICS
M 10/02 Linguistics: Phonology and Morphology I.
T 10/03 Linguistics: Phonology and Morphology II.
W 10/04 Linguistics: Syntax (Phrase Structure vs. Dependency).
Week 5
WORDS & THE LEXICON
M 10/09 Word Classes and Lexicography. Mutual Information (the "pointwise" version). The t-score. The Chi-square test.
T 10/10 Word Classes for NLP tasks. Parameter Estimation. The Partitioning Algorithm.
W 10/11 HW 1 discussion. Complexity Issues of Word Classes. Programming Tricks & Tips.
Week 6
HIDDEN MARKOV MODELS
M 10/16 Fall Break - NO CLASS
T 10/17 Markov models, Hidden Markov Models (HMMs).
W 10/18 The Trellis & the Viterbi Algorithms.
Week 7
TAGGING (INTRODUCTION)
M 10/23 Estimating the Parameters of HMMs. The Forward-Backward Algorithm. Implementation Issues.
T 10/24 The task of Tagging. Tagsets, Morphology, Lemmatization.
W 10/25 Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview). Mid-term Review.
Week 8
STATISTICAL TAGGING & EVALUATION
M 10/30 MID-TERM EXAM. (Question, Answers)
T 10/31 HMM Tagging (Supervised, Unsupervised). Evaluation methodology (examples from tagging). Precision, Recall, Accuracy.
W 11/01 Statistical Transformation Rule-Based Tagging.
Week 9
TAGGING WITH FEATURES, OTHER LANGUAGES
M 11/06 Maximum Entropy.
T 11/07 Maximum Entropy Tagging.
W 11/08 Feature Based Tagging. Results on Tagging Various Natural Languages.
Week 10
GRAMMARS & PARSING ALGORITHMS
M 11/13 Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking.
T 11/14 Shift-reduce parser. Introduction.
W 11/15 Tagging Homework - Review. No slides.
Week 11
PROBABILISTIC PARSING. TREEBANKS
M 11/20 Shift-Reduce Parsers in Detail. (Same set of slides as Wednesday last week.)
T 11/21 Treebanks and Treebanking. Evaluation of Parsers.
W 11/22 Probabilistic Parsing. Introduction.
Week 12
PCFG PARAMETER ESTIMATION
M 11/27 PCFG Parameter Estimation. Common slides with Wednesday 11/22.
T 11/28 PCFG: Best parse. Probability of a string.
W 11/29 HW #4 Review.
Week 13
STATISTICAL PARSING. MACHINE TRANSLATION
M 12/04 Lexicalized PCFG.
T 12/05 Statistical Machine Translation (MT).
W 12/06 Alignment and Parameter Estimation for MT.
Week 14
REVIEW
M 12/11 Final review session.


Grading Weights:

Assignments (4) 60%
Mid-term exam 12%
Final exam 21%
Class Participation   7%


Exams:

Exam Date, Time Where
Mid-term Oct. 30 2000, 2-2:30 Shaf 300
Final Dec. 16 2000 (Sat.!), 9-12 Shaf 303