Michael Paul
<< return to homepage

Markov Modeling Package

This software includes basic implementations of the mixed membership Markov model (M4), introduced and described in the paper below, as well as a block HMM implementation. See the included README for open licensing information as well as usage instructions and input/output formatting guidelines.

M4 and the block HMM are used to model sequences of text blocks, such as paragraphs in an article or messages in a conversation. In the same way that topic models can discover unsupervised clusters of semantically related words, these Markov models can discover word clusters which also respect sequential context. When applied to conversation data, these have been shown to discover word classes resembling speech/dialog acts.

Please contact me if you find any bugs/errors. It may be a good idea to check back every once in a while in case there are future updates, especially in case bugs are discovered.


Revision History
  • 9/18/2012 - v0.12 - Fixed some small bugs in displaying the output of the block HMM. Thanks to William Darling for pointing this out!
  • 8/6/2012 - v0.11 - Added a parameter which gives the option of sampling from an approximate distribution. This is much faster, but may lead to poorer results. See section 4.1 of the README for details.
  • 7/12/2012 - v0.1 - First release.

  • Michael J. Paul. Mixed Membership Markov Models for Unsupervised Conversation Modeling. In the proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), pages 94--104, Jeju Island, Korea. July 2012.