600.465 Introduction to NLP (Fall 2000)
Midterm Exam Answers
Date: Oct 30, 2000 2pm (30 min.)
If asked to compute something for which you have the numbers, that
really means to compute the final number, not just to write the
formula. If asked for a formula, write down the formula.
1. Probability
Let S = { a, b, c } (the sample space), and p be the joint
distribution on a sequence of two events (i.e. on S x S, ordered). If
you know that p(a,a) [a followed by a] = 0.25, p(c,c) [c followed by
c] = 0.25, p(b,a) [b followed by a] = 0.125,
p(b,b) [b followed by b] = 0, p(a,c) [a followed by c]
= 0.25, pL(a) [unigram probability of a as a lefthand bigram member] = .5,
and pR(b) [unigram probability of b as the righthand bigram
member] = 0.125, is it enough to compute
p(bc) (i.e., the probability of seeing b if we already know that the
preceding event generated c)?
 Yes / No: __Yes__
 why? __The bigram probabilities sum up to 0.875; the unigram constraints
further determine that p(a,b) = 0, thus the remaining
.125 must be at p(c,b) (from p(a,b) + p(b,b) + p(c,b) = pR(b));
therefore p is fully defined.___
__Therefore we can get p_{L}(c) = sum over i of p(c,i), then p(bc) = p(c,b)/p(c)._
 If yes, compute: p(bc) = ___1/3__( = p(c,b) / p_{L}(c) = .125 / .375)____
2. Estimation and Crossentropy
Use the bigram distribution p from question 1.
 Write one example of a data sequence which faithfully follows the
distribution (i.e., a training data from which we would get the above
bigram distribution using the MLE method):
E.g.: __a__
__c__
__c__
__b__
__a__
__a__
__a__
__c__
__c__
 What is the crossentropy H_{data}(p) in bits and the
perplexity^{1} G_{data}(p) of the bigram distribution from
question 1 if computed against the following data (use the dataoriented formula for conditional distribution derived from p):
data = b a a a
H_{data}(p) = ____5/4______
_{___}
G_{data}(p) = _ 2 x ^{4}V2 __ (twice the square root of square root of 2)
3. Mutual information
Use the bigram distribution from question 1.
 What is the pointwise mutual information of b and a (in this order)?
I_{pointwise}(b,a) = _ 3log_{2}(3) = 1.415... _(
= log_{2}(p(b,a)/p_{L}(b)p_{R}(a)) =
log_{2}(0.125/((0.125)(0.375))) = log_{2}(8/3) __
4. Smoothing and the sparse data problem
 Name three methods of smoothing:
 ____"Add 1" (or "Add lambda")_____________________________
 ____GoodTuring___________________________________________
 ____Linear Interpolation__________________________________

If you were to design a trigram language model, how would the final smoothed
distribution be defined if you use the linear interpolation smoothing method?
 ____p_{3}'(w_{i}w_{i2},w_{i1}) =
l_{3}p_{3}(w_{i}w_{i2},w_{i1}) +
l_{2}p_{2}(w_{i}w_{i1}) +
l_{1}p_{1}(w_{i}) + l_{0}(1/V),
l_{0} + l_{1} + l_{2} + l_{3} = 1___
5. Classes based on Mutual Information
Suppose you have the following data:
It is interesting to watch , at least from
the foreign policy perspective , how the wannabe president George W .
differs from his father , the former president George Bush .
What is the best pair of candidates for the first merge, if you use
the greedy algorithm for classes based on bigram mutual information
(i.e. the homework #2 algorithm)? Use your judgment, not computation;
in case of two or more best candidates, write as many as you can
find.
2 solutions:

Word 1: ___W________________
Word 2: ___Bush_____________

Word 1: ___wannabe__________
Word 2: ___former___________
6. Hidden Markov Models
 What is the Trellis algorithm good for? (Use max. 5 sentences for
the answer.)
____To find the probability of a string based on____________________________
____a parametrized (trained) HMM which presumably_______________
____generated the string._______________________________________________
_________________________________________________________________________
_________________________________________________________________________
 What is the Viterbi algorithm good for? (Use max. 5 sentences for
the answer.)
____To find the best path through a state sequence graph given________________
____a parametrized (trained) HMM and some input data presumably_______________
____generated by that HMM._______________________________________________
_________________________________________________________________________
_________________________________________________________________________
Now check if you have filled in your name and SSN. Also,
please carefully check your answers and hand the exam in.
^{1} The crossentropy and perplexity computation is the only one computation
here for which you might need a calculator; it is ok if you use an
expression (use the appropriate (integer) numbers, though!).