600.465 Introduction to NLP (Fall 2000)
Midterm Exam
Date: Oct 30, 2000 2pm (30 min.)
Name: ___________________________________________
SSN: ___________________________________________
If asked to compute something for which you have the numbers, that
really means to compute the final number, not just to write the
formula. If asked for a formula, write down the formula.
1. Probability
Let S = { a, b, c } (the sample space), and p be the joint
distribution on a sequence of two events (i.e. on S x S, ordered). If
you know that p(a,a) [a followed by a] = 0.25, p(c,c) [c followed by
c] = 0.25, p(b,a) [b followed by a] = 0.125,
p(b,b) [b followed by b] = 0, p(a,c) [a followed by c]
= 0.25, pL(a) [unigram probability of a as a lefthand bigram member] = .5,
and pR(b) [unigram probability of b as the righthand bigram
member] = 0.125, is it enough to compute
p(bc) (i.e., the probability of seeing b if we already know that the
preceding event generated c)?
 Yes / No: _______
 why? _________________________________________________________________
______________________________________________________________________
______________________________________________________________________
 If yes, compute: p(bc) = _________________________________________
2. Estimation and Crossentropy
Use the bigram distribution p from question 1.
 Write one example of a data sequence which faithfully follows the
distribution (i.e., a training data from which we would get the above
bigram distribution using the MLE method):
_____
_____
_____
_____
_____
_____
_____
_____
_____
 What is the crossentropy H_{data}(p) in bits and the
perplexity^{1} G_{data}(p) of the bigram distribution from
question 1 if computed against the following data (use the dataoriented formula for conditional distribution derived from p):
data = b a a a
H_{data}(p) = ___________________________
G_{data}(p) = ___________________________
3. Mutual information
Use the bigram distribution from question 1.
 What is the pointwise mutual information of b and a (in this order)?
I_{pointwise}(b,a) = ___________________________________________
4. Smoothing and the sparse data problem
 Name three methods of smoothing:
 _________________________________________________
 _________________________________________________
 _________________________________________________

If you were to design a trigram language model, how would the final smoothed
distribution be defined if you use the linear interpolation smoothing method?
 _______________________________________________________________________
5. Classes based on Mutual Information
Suppose you have the following data:
It is interesting to watch , at least from
the foreign policy perspective , how the wannabe president George W .
differs from his father , the former president George Bush .
What is the best pair of candidates for the first merge, if you use
the greedy algorithm for classes based on bigram mutual information
(i.e. the homework #2 algorithm)? Use your judgment, not computation;
in case of two or more best candidates, write as many as you can
find.

Word(s) 1: ____________________________________________________________
Word(s) 2: ____________________________________________________________
6. Hidden Markov Models
 What is the Trellis algorithm good for? (Use max. 5 sentences for
the answer.)
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
 What is the Viterbi algorithm good for? (Use max. 5 sentences for
the answer.)
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
Now check if you have filled in your name and SSN. Also,
please carefully check your answers and hand the exam in.
^{1} The crossentropy and perplexity computation is the only one computation
here for which you might need a calculator; but it is ok if you use an
expression (use the appropriate (integer) numbers, though!).