
|
Natural Language Processing
Prof. Jason Eisner
Course # 601.465/665 — Fall 2021
|
|
Announcements
- 12/20/21 Final exam will be Mon 12/20, 9am-noon in the classroom. I've posted practice exams on Piazza, and we'll schedule a review session.
- 11/17/21 HW7
is available at 25 pages. Long homework, but it walks you through a
series of exercises, holding your hand along the way.
You may work in pairs. It is due on Monday, December 6, at
11:59pm (as late as we could make it without cutting into reading
period).
- 11/4/21 HW6
is available. It ties together lots of stuff from the class!
We've provided a long reading handout to review the
ideas and fill in some details. This homework shouldn't be
too hard conceptually if you followed the HMM and CRF
lectures, but you'll still have to keep track of a lot of
ideas, code, and experiments. You may work in pairs. We've
extended the deadline by 4 days, to Tuesday, November 23, at
11pm. But please don't wait for Thanksgiving break; get
started now with the reading so that you know what lies ahead.
- 10/19/21 The midterm has been rescheduled to Monday,
10/25. It will be held from 3-4:30 in the classroom. We'll extend the
deadline for HW4 as a result (see amended date below).
- 10/9/21 HW5 is available, with a short
"reading handout" appended to it. It deals with
attaching semantic λ-expressions to grammar rules. It is
due on Wednesday, 11/3, at 11pm.
- 10/1/21 HW4
is available, with a separate "reading handout" appended to it. You may want to do HW3 first, but we're making HW4 available now so that you can read the
handout while parsing is still fresh in your mind from lecture.
The reading might also help you study parsing for the midterm.
This is the conceptually hardest homework project in the course, with two
major challenges: probabilistic Earley parsing, and making parsing
efficient. It is due on
Monday, 10/25 Wednesday, 10/27, at 11pm. You
may work with a partner on this one.
- 9/26/21 HW3
is now available, with a separate "reading handout" appended to it.
The due date is Sunday, 10/10, at 11pm. Start early: This is
a long and detailed homework that requires you to write
some smoothing code and experiment with their parameters and design
to see what happens. It should be manageable because we've already covered
the ideas in class and on HW2, and because we've provided
you with a good deal of code. But it may take some time to understand
that code and the libraries that it uses (especially PyTorch).
I strongly suggest
that you start reading the 25-page reading handout now, then study
the starter code and ask questions on Piazza as needed.
Spread the work out. You may work in pairs.
- 9/10/21 HW2
(11 pages) is available. It's due in a little over 2 weeks: Mon 10/27 at
2pm. This homework is mostly a problem set about manipulating
probabilities. But it is a long homework! Most
significantly, question 6 asks you to read a separate handout
and to work through a series of online lessons, preferably
with a partner or two. Question 8 asks you to write a small
program. It is okay to work on questions 6 and 8 out of
order.
- 9/1/21 HW1
(12 pages) is available. It is due on Wed 9/15 at 2pm: please
get this one in on time so we can discuss it in class an hour
later.
- 8/28/21
Start of class is Monday 8/30, 3pm.
You are expected to attend class in person if possible, but
there is a seating limit of 49 due to Covid. If you are unable
to come in person (this includes waitlisted students), please join by Zoom.
The Zoom link is posted on Piazza.
- 8/28/21 Please bookmark this
page.
All enrolled and waitlisted students will soon be added to Piazza
and Gradescope.
Key Links
- Syllabus -- reference info about
the course's staff, meetings, office hours, textbooks, goals,
expectations, and policies. May be updated on occasion.
- Piazza
site for discussion and announcements. Sign up, follow, and participate!
- Gradescope
for submitting your homework.
- Zoom link if
you can't attend class in person. The passcode is posted on Piazza.
- Office hours for the course staff
- Video
recordings of class meetings. Use these in case of
emergency, or for review (but not as a reason to skip the live class).
Schedule
Warning: The schedule below is adapted from last year's schedule and may still change! Links to future lecture slides, homeworks, and dates currently point to last year's versions. Watch Piazza for important updates, including when assignments are given and when they are due.
What's Important? What's Hard? What's Easy? [1 week]
Mon 8/30:
Wed 9/1:
Fri 9/3:
- Uses of language models
- Language ID
- Text categorization
- Spelling correction
- Segmentation
- Speech recognition
- Machine translation
- Optional reading about n-gram language models: M&S 6 (or R&S 6)
Building Language Models [1 week]
Mon 9/6 (Labor Day: no class)
Wed 9/8,
Fri 9/10:
- Probability concepts
- Joint & conditional prob
- Chain rule and backoff
- Modeling sequences
- Surprisal, cross-entropy, perplexity
- Optional reading about probability, Bayes' Theorem, information theory: M&S 2; slides by Andrew Moore
- Smoothing n-grams (video lessons, 52 min. total)
- Maximum likelihood estimation
- Bias and variance
- Add-one or add-λ smoothing
- Cross-validation
- Smoothing with backoff
- Good-Turing, Witten-Bell (bonus slides)
- Optional reading about smoothing: M&S 6; J&M 4; Rosenfeld (2000)
- HW2 given: Probabilities
Mon 9/13:
- Bayes' Theorem
- Log-linear models (self-guided interactive visualization with handout)
- Parametric modeling: Features and their weights
- Maximum likelihood and moment-matching
- Non-binary features
- Gradient ascent
- Regularization (L2 or L1) for smoothing and generalization
- Conditional log-linear models
- Application: Language modeling
- Application: Text categorization
- Optional reading about log-linear models: Collins (pp. 1-4) or Smith (section 3.5)
Grammars and Parsers [3- weeks]
Wed 9/15:
- HW1 due
- In-class discussion of HW1
- Improving CFG with attributes (video lessons, 62 min. total)
- Morphology
- Lexicalization
- Tenses
- Gaps (slashes)
- Optional reading about syntactic attributes: J&M 15 (2nd ed.)
Wed 9/15 (continued),
Fri 9/17,
Mon 9/20:
Wed 9/22,
Fri 9/24:
Mon 9/27:
- HW2 due
- Quick in-class quiz: Log-linear models
- Probabilistic parsing
- PCFG parsing
- Dependency grammar
- Lexicalized PCFGs
- Optional reading on probabilistic parsing: M&S 12, J&M 14
Wed 9/29:
Fri 10/1:
Representing Meaning [1 week]
Mon 10/4,
Wed 10/6,
Fri 10/8:
- HW3 due on Wed 10/6
- Semantics
- What is understanding?
- Lambda terms
- Semantic phenomena and representations
- More semantic phenomena and representations
- Adding semantics to CFG rules
-
Compositional semantics
-
Optional readings on semantics:
- HW5 given: Semantics
Representing Everything: Deep Learning for NLP[1 week]
Mon 10/11,
Wed 10/13,
Fri 10/15:
- Back-propagation (video lesson, 33 min.)
- Deep learning [slides currently evolving]
- [topics will be listed here]
Unsupervised Learning [2- weeks]
Mon 10/18,
Wed 10/20:
Fri 10/22 Mon 10/25:
-
Midterm exam (3-4:30 in classroom)
Mon 10/25,
Wed 10/27:
Discriminative Modeling [1- week]
Fri 10/29,
Mon 11/1:
Algebraic Methods [2+ weeks]
Wed 11/3,
Fri 11/5,
Mon 11/8,
Wed 11/10:
HW5 due on Wed 11/3
-
Finite-state algebra
- Regexp review
- Properties
- Functions, relations, composition
- Simple applications
- Optional reading on finite-state operators: chaps 2-3 of XFST book draft
-
Finite-state machines
- Acceptors
- Expressive power
- Weights and semirings
- Lattice parsing
- Transducers
Fri 11/12:
-
Noisy channels and FSTs
- Segmentation
- Spelling correction
- The noisy channel generalization
- Implementation using FSTs
- Examples:
- Baby talk
- Morphology
- Edit distance
- Transliteration
- Speech recognition
- Optional reading on finite-state NLP: Karttunen (1997)
Mon 11/15:
-
Finite-state tagging
- The task
- Hidden Markov Models
- Transformation-based
- Constraint-based
- Optional reading on tagging: J&M 8 or M&S 10
Wed 11/17:
Fri 11/19:
- HW6 due
- Morphology and phonology
- English, Turkish, Arabic
- Stemming
- Compounds, segmentation
- Two-level morphology
- Punctuation
- Rewrite rules
- OT
- Optional reading on morphology: R&S 2
Applications [1+ week]
Mon 11/22 (Thanksgiving break),
Wed 11/24 (Thanksgiving break),
Fri 11/26 (Thanksgiving break)
Mon 11/29,
Wed 12/1,
Fri 12/3:
Mon 12/6:
Exam period (12/13 - 12/21):
- Final exam review session (date TBA)
- Final exam (Mon 12/20, 9am-noon, Maryland 109)
Unofficial Summary of Homework Schedule
These dates were copied from the schedule above, which is subject to change.
Homeworks are due approximately every two weeks, with longer homeworks getting more time. But the
homework periods are generally longer than two weeks -- they overlap. This gives you more flexibility
about when to do each assignment, which is useful if you have other classes and activities.
We assign homework n as soon as you've seen the lectures you need, rather than waiting
until after homework n-1 is due. So you can jump right in while the material is fresh.
- HW1 (grammar): given Wed 9/1, due Wed 9/15
- HW2 (probability): given Fri 9/10, due Mon 9/27
- HW3 (empiricism): given Wed 9/15, due Sun 10/10
- HW4 (algorithms): given Wed 9/29, due Mon 10/25
- HW5 (logic): given Fri 10/8, due Wed 11/3
- HW6 (machine learning): given Mon 10/25, due Fri 11/19 (last day before Thanksgiving break)
- HW7 (automata): given Wed 11/10, due Mon 12/6 (last day of class)
Recitation Schedule
Recitations are normally held on Tuesdays (see the syllabus). Enrolled students are expected to attend the recitation and participate in solving practice problems. This will be more helpful than an hour of solo study. The following schedule is subject to change.
Old Materials
Lectures from past years, some still useful:
Old homework: