Johns Hopkins University

Projects


This page describes my academic projects, details of my research activities can be found here


1) NLP, Information Retrieval and Information Extraction

Cross Lingual Information Retrieval[Report]

  • Developed a Cross Lingual Information Retrieval System for English and Hindi Languages.
  • Implemented a statistical generative bi-gram model for transliteration (using character alignments).
  • Implemented a Hindi language Stemmer which removed 51 inflectional suffixes.
  • Implemented a Dictionary based translation model and a TF-IDF based retrieval scheme.

Part of Speech Tagger

  • Implemented semi-supervised Part of Speech tagger for English using Hidden Markov Model.
  • The Tagger used forward-backward algortihm for EM training.
  • This project was my first implementation and application of Hidden Markov Models.
  • Complete implementation was done in Python.

Earley Parser

  • Implemented a English language parser for probabilistic context free grammar based on Earley algorithm.
  • Pruning techniques like agenda based best first strategy, left-cornering and inverted index were used.
  • The Parser was tested on sentences from Wall Street Journal.

Decision Tree language mode for English Letters

  • Implemented decision-tree as a 4-gram language model with cross validation.
  • Generated an encoding of English letters into bit-strings using mutual information criterion. Here is a graphical representation of the encoding.
  • The model used both restricted question set using bit-encoding and unrestricted questions based on Chou's algorithm.
  • Entropy and Gini index were used as goodness criteria’s.
  • Used Python for implementation.

Isolated Word recognizer based on fenonic baseforms model

  • Developed an isolated word recognizer for a small vocabulary system, based on Hidden Markov Model.
  • The word recognition system was trained on a sample training dataset using Baum-Welch algorithm.
  • The model used fenonic baseforms for each word and a approx. 80 state HMM.
  • Developed 2 different HMM models
    • A 7-state HMM for silence.
    • 2-state HMM for speech.
  • Implemented the data processing part in Python and EM training code in C.

Other projects:

  • Web crawler in perl.
  • TF-idf based vector model for Word Sense Disambiguation
  • Spam classifier using tri-gram model and Witten Bell smoothing.

2) Operating Systems and Parallel Programming

Sub graph isomorphism using Map reduce

  • Given two input graph the decision problem of finding if one graph or its subgraph is isomorphic to another graph was implemented.
  • The algorithm was based on generating permutation matrices, as described here.
  • Parallelized the generation and testing of permutation matrices, using Hadoop Map Reduce framework.

Slab Allocator

  • Implemented an object based slab allocator closely adapted in Linux kernel.
  • The reference used for this project was Jeff Bonwick's Slab allocator.
  • The various features include caching objects in initialized state, dynamic slab size selection, slab coloring, self-scaling hash, small object cache for meta data and minimum fragmentation(below12.5%).

Kernel Keyboard Logger

  • Implemented a kernel module to log key strokes using kprobes, a mechanism to probe kernel functions.
  • Required synchronization between interrupt and non-interrupt driven code. Read scan codes which were made accessible through a debugfs file.