We assume that you’ve made a local copy of http://www.cs.jhu.edu/~jason/465/hw-parse/ (for example, by downloading and unpacking the zipfile there) and that you are currently in that directory.
First try out the recognizer that we provided.
./recognize.py papa.gr papa.sen
./recognize.py -v papa.gr papa.sen # verbose output
Copy the program recognize.py to parse.py.
Then edit parse.py to transform our unweighted recognizer
into your probabilistic parser. You should be able to call it as
./parse.py papa.gr papa.sen | ./prettyprint
Notice that the docstring for the Agenda class includes
some sample calls and their intended output. The bottom of the script
uses doctest to
check that those calls behave as intended. You may want to use
doctest (or unittest) to
document and test your own classes and methods, too.
You are free to ignore part or all of recognize.py and
just write your own solution from scratch, if you prefer. If your
solution is not in Python, you should still submit a
parse.py, which can just simply run your program, for
example like this:
#!/usr/bin/env python3
import sys
import subprocess
subprocess.run(["java","-jar","myparser.jar"] # invoke my parser
+ sys.argv[1:]) # on the same args
Notice that even recognize is very slow on a large
grammar, and of course parse will be at least as slow:
./recognize.py --progress wallstreet.gr wallstreet.sen
./parse.py --progress wallstreet.gr wallstreet.sen
So copy parse.py to parse2.py and make that
version faster.
(The --progress option displays the fraction of columns
that have been processed so far. In addition, the -v option
reports the number of PREDICT, SCAN, and ATTACH actions for each
sentence. You could enhance it to report more detailed statistics if you
like.)