Statistical Language Learning
Prof. Jason Eisner
Course # 600.665 - Spring 2002
"When the going gets tough, the tough get empirical" -- Jon Carroll
Catalog description: This course focuses on past and present research that has attempted,
with mixed success, to induce the structure of language from raw data
such as text. Lectures will be intermixed with reading and discussion of
the primary literature. Students will critique the readings, answer
open-ended homework questions, and undertake a final project.
Prereq: 600.465 or perm
The main goals of the seminar are (a) to cover some techniques
people have tried for inducing hidden structure from text, (b) to get
you thinking about how to do it better.
Since most of the techniques in (a) don't perform that well, (b) is
The course should also help to increase your comfort with the
building blocks of statistical NLP - weighted transducers,
probabilistic grammars, graphical models, etc., and the supervised
training procedures for these building blocks.
|Lectures:||MTW 2-3 pm, Shaffer 304 (but we'll
move to the NEB 325a conference room if we're not too big) |
|Prof:||Jason Eisner - |
MW 3-4 pm, or by appt, in NEB 326|
|Mailing list:||email@example.com (cs665 also works on NLP lab machines) |
|Textbook:||none, but the textbooks for 465 may come in handy|
Grading: 30% written responses (graded as check/check-plus, etc.), 30% class participation, 40% project.|
Announcements: New readings announced by email and posted below.
Submission: Email me written responses to the whole week's
readings by 11 am each Monday.
Academic honesty: dept. policy (but you can work in pairs on reading responses)
Readings and Responses
Generally we will discuss about 3 related papers each week. Since we
may flit from paper to paper, comparing and contrasting,
you should read all the papers by the start of the week.
A centerpiece of the course is the requirement to respond
thoughtfully to each paper in writing. You should email me your
responses to the upcoming week's papers, in separate plaintext or
postscript messages, by noon each Monday. (Include "665
response" and the paper's authors in the subject line.) I will print
the responses out for everyone, and they will anchor our class
discussion. They will also be a useful source of ideas for your final
A typical response is 1-3 paragraphs; in a given week you might
respond at greater length to some papers than others. It's okay to
work with another person. What should you write about? Some
- Idea for a new experiment, model or other research opportunity
inspired by the reading
- A clearer explanation of some point that everyone probably had to
- Unremarked consequences of the experimental design or results
- Additional experiments you really wish the author had done
- Other ways the research could be improved (e.g., flaws you spotted)
- Non-obvious connections to other work you know about from class or elsewhere
Please be as concrete as possible - and write clearly, since your
classmates will be reading your words of wisdom!
Suggestions for readings are welcome, especially well in advance.
- Week of Jan. 28: Bootstrapping
We will read one or two of these for Wednesday (to be chosen
in class on Monday).
- Week of Feb. 4: Classes of "interchangeable" words
- Chapter 3 of: Lillian Lee (1997). Similarity-based approaches to natural
language processing. Ph.D. thesis.
Harvard University Technical Report TR-11-97.
- Chapter 4 of: The same thing.
- Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and
Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the
Society for Information Science, 41(6), 391-407.
scanned version with figures
- Week of Feb. 11: Word meanings, word boundaries
- Carl de Marcken (1996). Linguistic structure as
composition and perturbation. Proceedings of ACL-96.
- Chengxiang Zhai (1997). Exploiting context to identify lexical atoms: A statistical view of
Proceedings of the International and Interdisciplinary Conference on Modelling and Using Context
(CONTEXT-97), Rio de Janeiro, Brzil, Feb. 4-6, 1997. 119-129.
- Jeffrey Mark Siskind:
- (1995) `Robust Lexical Acquisition Despite Extremely Noisy Input,' Proceedings of the 19th Boston University
Conference on Language Development (edited by
D. MacLaughlin and S. McEwen), Cascadilla Press, March.
- Section 6 of: (1996) A Computational Study of Cross-Situational Techniques for Learning Word-to-Meaning Mappings.
Cognition 61(1-2): 39-91, October/November.
- Week of Feb. 18: HMMs and Part-of-Speech Tagging
- Week of Feb. 25: Unsupervised Finite-State Topology
- Eric Brill (1995). Unsupervised Learning of
Disambiguation Rules for Part of Speech Tagging. Proc. of 3rd
Workshop on Very Large Corpora, MIT, June. Also appears in
Natural Language Processing Using Very Large Corpora,
- Sections 2.4-2.5 and Chapter 3 of: Andreas Stolcke (1994). Bayesian Learning of
Probabilistic Language Models. Ph.D., thesis, University of
California at Berkeley.
Jose Oncina (1998). The data driven approach applied to the OSTIA algorithm.
In Proceedings of the Fourth International Colloquium on Grammatical Inference
Lecture Notes on Artificial Intelligence Vol. 1433, pp. 50-56
Springer-Verlag, Berlin 1998. ftp://altea.dlsi.ua.es/people/oncina/articulos/icgi98.ps.gz
Please also glance at the following papers so that you
roughly understand a couple of the variants that Oncina and his colleagues
have proposed: section 1 of this
paper on learning stochastic DFAs, and section 3
paper dealing with OSTIA-D and OSTIA-R.
- Week of Mar. 4: Learning Tied Finite-State Parameters
- Kevin Knight and Jonathan Graehl (1998). Machine
Transliteration. Computational Linguistics
24(4):599-612, December. [Hardcopy available and preferred; in a pinch,
read the slightly less detailed ACL-97
- Richard Sproat and Michael Riley (1996). Compilation of
Weighted Finite-State Transducers from Decision
Trees. Proceedings of ACL. http://arXiv.org/ps/cmp-lg/9606018
- Jason Eisner (2002). Parameter Estimation for
Probabilistic Finite-State Transducers. Submitted to ACL.
- Week of Mar. 11: Inside-Outside Algorithm
If you need to review the inside-outside algorithm, check
slides before reading the following papers. The slide fonts are
unfortunately a bit screwy unless you view under Windows.
- K. Lari and S. Young (1990). The estimation of
stochastic context-free grammars using the inside-outside
algorithm. Computer Speech and Language 4:35-56. scanned PDF version
- Fernando Pereira and Yves Schabes (1992). Inside-outside
reestimation from partially bracketed corpora. Proceedings of
the 20th Meeting of the Association for Computational
Linguistics. scanned PDF version
- Carl de Marcken (1995). On the unsupervised induction of
phrase-structure grammars. Proc. of the 3rd Workshop on Very
Large Corpora. http://bobo.link.cs.cmu.edu/grammar/demarcken.ps
- Week of Mar. 18: Spring break!
- Week of Mar. 25: More CFG Learning
- Week of Apr. 2: Maximum Entropy Parsing Models
- Week of Apr. 9: Bootstrapping Syntax
- Week of Apr. 16: Neural nets
- Week of Apr. 23
- John M. Zelle and Raymond J. Mooney (1996). Comparative Results on Using Inductive Logic Programming for Corpus-based Parser
Construction. In S. Wermter, E. Riloff and G. Scheler
(Eds.), Symbolic, Connectionist, and Statistical Approaches to
Learning for Natural Language Processing. Springer Verlag.
- Robert C. Berwick and Sam Pilato (1987). Learning Syntax
by Automata Induction. Machine Learning 2: 9-38.
scanned individual pages
Note: No class on Wednesday April 24.
- Week of Apr. 30
- Monday, May 13: Due date for final project
- Wednesday, May 15, 9am-12pm: Project presentation party (in
lieu of final exam) with 20-minute talks