
Natural Language Processing
Prof. Jason Eisner
Course # 601.465/665 — Fall 2022
|
|
Announcements
- 12/10/22 Final exam will be Mon 12/19, 6-9pm, in Bloomberg 272. I've posted practice exams on Piazza, and we'll schedule a review session.
- 11/25/22 HW7
is available at 25 pages. Long homework, but it walks you through a
series of exercises, holding your hand along the way.
You may work in pairs. It is due on Friday, December 9, at
11pm (as late as we could make it without cutting into reading
period).
- 11/15/22
By popular request, we've extended the HW6 deadline by 4 days, to
Tuesday, November 22, at 11pm. But please don't wait for Thanksgiving break;
get started now with the reading so that you know what lies ahead.
- 11/7/22 HW6
is available. It ties together lots of stuff from the class!
We've provided a long reading handout to review the
ideas and fill in some details. This homework shouldn't be
too hard conceptually if you followed the HMM and CRF
lectures, but you'll still have to keep track of a lot of
ideas, code, and experiments. You may work in pairs. The
deadline is Friday 11/18.
- 10/24/22 HW5 is available (actually has been available for a
couple of weeks), with a short "reading handout" appended to it. It deals with
attaching semantic λ-expressions to grammar rules. It is
due on Friday, 11/4, at 11pm.
- 10/2/22 HW4
is available, with a separate "reading handout" appended to it. You may want to do HW3 first,
but we're making HW4 available now so that you can read the
handout while parsing is still fresh in your mind from lecture.
The reading might also help you study parsing for the midterm.
This is the conceptually hardest homework project in the course, with two
major challenges: probabilistic Earley parsing, and making parsing
efficient. It is due on Monday, 10/24, at 11pm. You
may work with a partner on this one.
- 9/26/22 Oops, the midterm review was scheduled after the midterm!
To fix this, I've moved the midterm review a week earlier (to Tue 10/11)
and moved the midterm slightly later (to Wed 10/12).
- 9/25/22 HW3
is now available, with a separate "reading handout" appended to it.
The due date is Friday, 10/7, at 11pm. Start early: This is
a long and detailed homework that requires you to write
some smoothing code and experiment with their parameters and design
to see what happens. It should be manageable because we've already covered
the ideas in class and on HW2, and because we've provided
you with a good deal of code. But it may take some time to understand
that code and the libraries that it uses (especially PyTorch).
I strongly suggest
that you start reading the 25-page reading handout now, then study
the starter code and ask questions on Piazza as needed.
Spread the work out. You may work in pairs.
- 9/9/21 HW2
(11 pages) is available. It's due in a little over 2 weeks: Tue 9/27 at
2pm. This homework is mostly a problem set about manipulating
probabilities. But it is a long homework! Most
significantly, question 6 asks you to read a separate handout
and to work through a series of online lessons, preferably
with a partner or two. Question 8 asks you to write a small
program. It is okay to work on questions 6 and 8 out of
order.
- 8/31/22 HW1
(12 pages) is available. It is due on Wed 9/15 at 2pm: please
get this one in on time so we can discuss it in class an hour
later.
- 8/25/22 Room change! Our main classroom is now Shaffer 303, but on Wednesdays another class has that room so we'll be up in Bloomberg 272. The Tuesday evening meetings will be in Hodson 210.
- 8/22/22 As explained on the syllabus, please keep MWF 3-4:30 pm open to accommodate a variable class schedule as well as office hours after class. Our weekly problem discussion sessions are Tu 6-7:30 pm.
- 8/22/22 Please bookmark this
page.
All enrolled students will soon be
added to Piazza and Gradescope.
Key Links
- Syllabus -- reference info about
the course's staff, meetings, office hours, textbooks, goals,
expectations, and policies. May be updated on occasion.
- Piazza
site for discussion and announcements. Sign up, follow, and participate!
- Gradescope
for submitting your homework.
- Office hours for the course staff (TBA).
- Video recordings (see policy on syllabus)
Schedule
Warning: The schedule below is adapted from last year's schedule and may still change! Links to future lecture slides, homeworks, and dates currently point to last year's versions. Watch Piazza for important updates, including when assignments are given and when they are due.
What's Important? What's Hard? What's Easy? [1 week]
Mon 8/29:
Wed 8/31:
Fri 9/2:
- Uses of language models
- Language ID
- Text categorization
- Spelling correction
- Segmentation
- Speech recognition
- Machine translation
- Optional reading about n-gram language models: M&S 6 (or R&S 6)
Probabilistic Modeling [1 week]
Mon 9/5 (Labor Day: no class)
Wed 9/7,
Fri 9/9:
- Probability concepts
- Joint & conditional prob
- Chain rule and backoff
- Modeling sequences
- Surprisal, cross-entropy, perplexity
- Optional reading about probability, Bayes' Theorem, information theory: M&S 2; slides by Andrew Moore
- Smoothing n-grams (video lessons, 52 min. total)
- Maximum likelihood estimation
- Bias and variance
- Add-one or add-λ smoothing
- Cross-validation
- Smoothing with backoff
- Good-Turing, Witten-Bell (bonus slides)
- Optional reading about smoothing: M&S 6; J&M 4; Rosenfeld (2000)
- HW2 given: Probabilities
Mon 9/12:
- Bayes' Theorem
- Log-linear models (self-guided interactive visualization with handout)
- Parametric modeling: Features and their weights
- Maximum likelihood and moment-matching
- Non-binary features
- Gradient ascent
- Regularization (L2 or L1) for smoothing and generalization
- Conditional log-linear models
- Application: Language modeling
- Application: Text categorization
- Optional reading about log-linear models: Collins (pp. 1-4) or Smith (section 3.5)
Grammars and Parsers [3- weeks]
Wed 9/14:
- HW1 due
- In-class discussion of HW1
- Improving CFG with attributes (video lessons, 62 min. total)
- Morphology
- Lexicalization
- Tenses
- Gaps (slashes)
- Optional reading about syntactic attributes: J&M 15 (2nd ed.)
Wed 9/14 (continued),
Fri 9/16,
Mon 9/19:
Wed 9/21,
Fri 9/23:
Tue 9/27 (we will swap Mon and Tue this week):
- HW2 due
- Quick in-class quiz: Log-linear models
- Probabilistic parsing
- PCFG parsing
- Dependency grammar
- Lexicalized PCFGs
- Optional reading on probabilistic parsing: M&S 12, J&M 14
Wed 9/28:
Fri 9/30:
Representing Meaning [1 week]
Mon 10/3,
Wed 10/5,
Fri 10/7:
Possibly we'll have no class on 10/5, and make up the time with long lectures (3-4:15) on 9/30, 10/3, and 10/7.
- HW3 due on
Wed 10/5 Fri 10/7
- Semantics
- What is understanding?
- Lambda terms
- Semantic phenomena and representations
- More semantic phenomena and representations
- Adding semantics to CFG rules
-
Compositional semantics
-
Optional readings on semantics:
- HW5 given: Semantics
Midterm
Mon 10/10: Wed 10/12
- Midterm exam (3-4:30, in classroom)
Representing Everything: Deep Learning for NLP [1+ week]
Wed 10/12 Mon 10/10,
Fri 10/14,
Mon 10/17,
Wed 10/19:
- Back-propagation (video lesson, 33 min.)
- Neural architectures
- Vectors, matrices, tensors; PyTorch operations; linear and affine operations
- Log-linear models, learned features, and nonlinearities
- Vectors as an alternative semantic representation
- Training signals: Categorical labels, similarity, matching
- Multi-step prediction of structures
- Encoders and decoders
- End-to-end training, multi-task training, pretraining + fine-tuning
- Self-supervised learning
- word2vec (skip-gram / CBOW)
- Recurrent neural nets
- seq2seq
Fri 10/21 (fall break: no class)
Unsupervised Learning [1+ week]
Mon 10/24,
Wed 10/26:
Fri 10/28,
Mon 10/31:
Discriminative Modeling [1- week]
Wed 11/2,
Fri 11/4:
Finite-State Methods [1+ week]
Mon 11/7:
- Finite-state algebra
- Regexp review
- Properties
- Functions, relations, composition
- Simple applications
- Optional reading on finite-state operators: chaps 2-3 of XFST book draft
Wed 11/9:
-
Finite-state implementation
- Operations on regular relations
- Weighted relations
- Finite-state constructions
- Uses of composition
- Optional reading on finite-state machines: R&S 1
Fri 11/11:
Mon 11/14:
-
Noisy channels and FSTs
- Segmentation
- Spelling correction
- The noisy channel generalization
- Implementation using FSTs
- Examples:
- Baby talk
- Morphology
- Edit distance
- Transliteration
- Speech recognition
- Optional reading on finite-state NLP: Karttunen (1997)
Deep Learning for Structured Prediction [1- week]
Wed 11/16, Fri 11/18:
- Neural architectures (continued)
- Few-shot learning with prompted language models [at recitation]
- Reducing structured prediction to tagging
- BiRNN-CRFs
- Decoders: Exact, greedy, beam search, independent, dynamic programming, stochastic, Minimum Bayes Risk (MBR)
- Attention and transformers
- Tokenization
- HW6 due on Fri 11/18
Mon 11/21,
Wed 11/23,
Fri 11/25 (Thanksgiving break)
NLP Applications [2 weeks]
Mon 11/28,
Wed 11/30,
Fri 12/2, Mon 12/5:
Wed 12/7, Fri 12/9:
Final
Exam period (12/14 - 12/22):
- Final exam review session (date TBA)
- Final exam (Mon 12/19, 6pm-9pm, Bloomberg 272)
Unofficial Summary of Homework Schedule
These dates were copied from the schedule above, which is subject to change.
Homeworks are due approximately every two weeks, with longer homeworks getting more time. But the
homework periods are generally longer than two weeks -- they overlap. This gives you more flexibility
about when to do each assignment, which is useful if you have other classes and activities.
We assign homework n as soon as you've seen the lectures you need, rather than waiting
until after homework n-1 is due. So you can jump right in while the material is fresh.
- HW1 (grammar): given Wed 8/31, due Wed 9/14
- HW2 (probability): given Fri 9/9, due Tue 9/27
- HW3 (empiricism): given Fri 9/16, due
Wed 10/5 Fri 10/7
- HW4 (algorithms): given Wed 9/28, due Mon 10/24
- HW5 (logic): given Fri 10/7, due Wed 11/2
- HW6 (machine learning): given Wed 10/26, due Fri 11/18 (last day before Thanksgiving break)
- HW7 (automata): given Wed 11/9, due Mon 12/9 (last day of class)
Recitation Schedule
Recitations are normally held on Tuesdays (see the syllabus). Enrolled students are expected to attend the recitation and participate in solving practice problems. This will be more helpful than an hour of solo study. The following schedule is subject to change.
Old Materials
Lectures from past years, some still useful:
Old homework: