600.465 Syllabus: Introduction to NLP (Fall 1999)

Where: Shaffer 301 (MW), Shaffer 303 (T)
When: MTW 2-3
(Other CS Dept. courses and authoritative dates/places)

Instructor: Jan Hajic TA: Gideon S. Mann
Email: hajic@cs.jhu.edu Email: gsm@loon.cs.jhu.edu
WWW: http://www.cs.jhu.edu/~hajic WWW: http://www.cs.jhu.edu/~gsm
Office: New Engineering Building 326 Office: Graduate Students Office, NEB 332
Phone: (410) 516-8438 Phone: (410) 516-4650
Office hours: Mon 10-11, Tue 3-4 and by appointment Office hours: Thu 2-3 and by appointment


New on this course's web pages: (exception: regular class slides posted without warning, usually the night before the class)



 

Prerequisites & Relation to Other Courses:

Students should have a substantial programming experience in either C, C++, Java and/or Perl, and have taken 600.226 (Data Structures) or its equivalent. Knowledge of, or willingness to learn the basics of Perl as-you-go (and on your own) is also important.

The material covered in this course is selected in such a way that at its completion you should be able to understand papers in the field of Natural Language Processing, and it should also make your life easier when taking 600.466, 600.666 and eventually 520.779 (although it is not a prerequisite for them).

No background in NLP is necessary.

Readings:


Assignments & Due Dates:

Turning in the Assignments

Policies

The Assignments

No. Due date Task Resources
#1
Oct 06 Exploring Entropy and Language Modeling hops:~hajic/cs465/TEXT{EN,CZ}1.txt
#2
Oct 27 Word Classes hops:~hajic/cs465/TEXT{EN,CZ}1.{txt,ptg}
#3
Dec 8 Tagging hops:~hajic/cs465/text{en,cz}2.ptg
#4
Dec 13 Noun Phrase Chunking. Instructions
Open to submissions Closed to submissions

Additional resources:


Tentative Course Schedule:

Week 1
INTRODUCTION; BASIC PROBABILITY & INFORMATION THEORY
M 09/13 Introduction, Organization, Homeworks. Course Overview: Intro to NLP. Main Issues.
T 09/14 The Very Basics on Probability Theory.
W 09/15 Elements of Information Theory I.
Week 2
LANGUAGE MODELING
M 09/20 Elements of Information Theory II.
T 09/21 Language Modeling in General and the Noisy Channel Model.
W 09/22 Smoothing and the EM algorithm.
Week 3
LINGUISTICS
M 09/27 Linguistics: Phonology and Morphology.
T 09/28 Linguistics: Syntax (Phrase Structure vs. Dependency).
W 09/29 Linguistics: Syntax (cont.). For foils, see yesterday.
Week 4
WORDS & THE LEXICON
M 10/04 Word Classes and Lexicography. Mutual Information (the "pointwise" version). The t-score. The Chi-square test.
T 10/05 Word Classes for NLP tasks. Parameter Estimation. The Partitioning Algorithm.
W 10/06 Complexity Issues of Word Classes. Programming Tricks & Tips.
Week 5
HIDDEN MARKOV MODELS
M 10/11 Markov models, Hidden Markov Models (HMMs).
T 10/12 The Trellis & the Viterbi Algorithms.
W 10/13 Estimating the Parameters of HMMs. The Forward-Backward Algorithm. Implementation Issues.
Week 6
TAGGING (INTRODUCTION)
M 10/18 Fall Break - NO CLASS
T 10/19 The task of Tagging. Tagsets, Morphology, Lemmatization.
W 10/20 Morphological Analysis and Generation. For foils, see The task of Tagging. ...
Week 7
TAGGING METHODS
M 10/25 Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview).
T 10/26 Homework No. 1 - Review and Results. Homework No. 4 - Introduction to the task.
W 10/27 Mid-term Review.
Week 8
STATISTICAL TAGGING & EVALUATION
M 11/01 MID-TERM EXAM. (Questionnaire, Answers)
T 11/02 HMM Tagging (Supervised, Unsupervised). Evaluation methodology (examples from tagging). Precision, Recall, Accuracy.
W 11/03 Statistical Transformation Rule-Based Tagging.
Week 9
TAGGING WITH FEATURES, OTHER LANGUAGES
M 11/08 Maximum Entropy.
T 11/09 Maximum Entropy Tagging.
W 11/10 Feature Based Tagging. Results on Tagging Various Natural Languages.
Week 10
GRAMMARS & PARSING ALGORITHMS
M 11/15 Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking.
T 11/16 Shift-reduce parser. Introduction. (For slides, see Monday next week.)
W 11/17 Tagging Homework - Review. No slides.
Week 11
PROBABILISTIC PARSING. TREEBANKS
M 11/22 Shift-Reduce Parsers in Detail.
T 11/23 Treebanks and Treebanking. Evaluation of Parsers.
W 11/24 Probabilistic Parsing. Introduction.
Week 12
PCFG PARAMETER ESTIMATION
M 11/29 PCFG Parameter Estimation. Common slides with Wednesday 11/24.
T 11/30 PCFG: Best parse. Probability of a string.
W 12/01 Class dismissed.
Week 13
STATISTICAL PARSING. MACHINE TRANSLATION
M 12/06 Lexicalized PCFG.
T 12/07 Statistical Machine Translation (MT).
W 12/08 Alignment and Parameter Estimation for MT.
Week 14
REVIEW
M 12/13 Final review session.


Course Requirements and Weights:

Assignments (4) 60%
Mid-term exam 12%
Final exam 21%
Class Participation   7%


Exams:

Exam Date, Time Where
Mid-term (Questionnaire, Answers) Nov. 01 1999, 2-2:30 Shaf 301
Final Dec. 21 1999, 9-12 Shaf 301
Make-up Final for "Incomplete" grades Jan. 19 2000, 2-5pm Meet at NEB 326