We released a set of parallel corpora between English and six
languagse from the Indian subcontinent, which you
can download here.
I wrote a JQuery stack decoder to help visualize word-based MT
for MT class. You can play with
the live online demo
or get the code on github.
You can find data (including the
grammar) and code for extracting TSG feature
sets on
Github. This data includes a version
of Mark
Johnson's exhaustive CKY parser modified to parse with
grammars containing rules intermingled terminals and
nonterminals and with a number of other convenient
command-line options.
The code for the experiments in our
2009 paper on inferring tree substitution
grammars is
available on github. It is small, modular, and
well-documented, and despite being written in Perl, I have
been told that it is easy to understand. It includes a patch
to Mark
Johnson's CKY
parser that allows it to be used with TSGs.
Charniak and Johnson's reranking code
(from their 2005 ACL paper)
extracts a large set of syntactic features from parse trees. An impediment to extracting
their features is that it's integrated into their reranking framework, requiring
fairly specialized file formats. I modified their
extract-spfeatures program
to enable the extraction of their feature set from a single parse tree in standard bracketed
format, e.g.,
$ echo "(S (NP (DT The) (NN child)) (VP (VBD demurred)))" | extract-spfeatures
It is available on github.
