This is the future home page of Hopskip (www.hopskip.org) - an open-source tool for specifying, training, and sharing probabilistic models of string and sequence data. Applications include text analysis and manipulation, speech recognition, machine translation, information extraction, music, genomics, etc.
You specify an appropriate probabilistic model using an extended regular expression language. Internally this is compiled into a parameterized finite-state machine. You can then train the free parameters from data. Training can be supervised, unsupervised, or something in between.
It is easy to specify complex models that are sensitive to linguistically meaningful features, that incorporate dictionaries or morphological analyzers, etc. You can try your models right away, without writing additional code. The Hopskip code will handle them in a highly optimized way.
We are planning a communal library of useful finite-state machines, such as taggers, parsers, lemmatizers, weighted translation dictionaries, and so on. You can use these directly or incorporate them into your own Hopskip models. You can also retrain them on new data.
For a technical paper and some overview slides, see Eisner (2002). The major technical contributions involve new learning algorithms that are sufficiently general to handle parameterized finite-state machines, and algorithmic tricks to speed them up. See also the related Dyna project, which is providing the underlying infrastructure.
Were [the gossiper's road] as straight as the Appia, and as broad as "that which leadeth to destruction," nevertheless would he be malcontent without a frequent hopskip-and-jump over the hedges, into the tempting pastures of digression beyond. -Edgar Allan Poe (1844)
jason@cs.jhu.edu
- $Date: 2006/01/31 16:06:31 $ (GMT)