600.465 Introduction to NLP (Fall 1999)
Midterm Exam
Date: Nov 01 2pm (30 min.)
SSN:__________________________________
Name:_________________________________
If asked to compute something for which you have the numbers, that
really means to compute the final number, not just to write the
formula. If asked for a formula, write down the formula.
1. Probability
Let S = { a, b, c } (the sample space), and p be the joint
distribution on a sequence of two events (i.e. on S x S, ordered). If
you know that p(a,a) [a followed by a] = 0.25, p(a,b) [a followed by
b] = 0.125, p(b,c) [b followed by c] = 0.125, p(c,a) [c followed by a]
= 0.25, and p(c,c) [c followed by c] = 0.25, is it enough to compute
p(b|a) (i.e., the probability of seeing b if we already know that the
preceding event generated a)?
- Yes / No: _______
- why? _______________________________________________________________________
____________________________________________________________________________
- If yes, compute: p(b|a) = _______________________________
2. Estimation and Cross-entropy
Use the bigram distribution from question 1.
- Write one example of a data sequence which faithfully follows the
distribution (i.e., a training data from which we would get the above
bigram distribution using the MLE method):
_____
_____
_____
_____
_____
_____
_____
_____
_____
- What is the cross-entropy Hdata(p) in bits and the
perplexity1 Gdata(p) of the bigram distribution from
question 1 if computed against the following data:
data = b c a
Hdata(p) = ____________
Gdata(p) = ____________
3. Mutual information
Use the bigram distribution from question 1.
- What is the pointwise mutual information of c and a (in this order)?
Ipointwise(c,a) = _________________________
4. Smoothing and the sparse data problem
- Name three methods of smoothing:
- __________________________________________________________
- __________________________________________________________
- __________________________________________________________
-
If you were to design a bigram language model, how would the final smoothed
distribution be defined if you use the linear interpolation smoothing method?
- ___________________________________________________________
5. Classes based on Mutual Information
Suppose you have the following data:
Is this question really so easy , or was it rather
the previous question , that was so difficult ?
What is the best pair of candidates for the first merge, if you use the
greedy algorithm for classes based on bigram mutual information
(i.e. the homework #2 algorithm)? Use your judgment, not computation.
-
Word 1: ____________________
Word 2: ____________________
6. Hidden Markov Models
- What is the Viterbi algorithm good for? (Use max. 5 sentences for
the answer.)
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
- What is the Baum-Welch algorithm good for? (Use max. 5 sentences for
the answer.)
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
Now check if you have filled in your name and SSN. Also,
please carefully check your answers and hand the exam in.
1 The perplexity computation is the only one computation
here for which you might need a calculator; it is ok if you use an
expression (use the appropriate (integer) numbers, though!).