Projects
This page describes my academic projects, details of my research
activities can be found here
1) NLP, Information Retrieval and Information Extraction
Cross Lingual Information Retrieval[Report]
- Developed a Cross Lingual Information Retrieval System for
English and Hindi Languages.
- Implemented a statistical generative bi-gram model for transliteration (using
character alignments).
- Implemented a Hindi language Stemmer which removed 51 inflectional suffixes.
- Implemented a Dictionary based translation model and a TF-IDF based retrieval
scheme.
Part of Speech Tagger
- Implemented semi-supervised Part of Speech tagger for English using Hidden
Markov Model.
- The Tagger used forward-backward algortihm for EM training.
- This project was my first implementation and application of Hidden Markov
Models.
- Complete implementation was done in Python.
Earley Parser
- Implemented a English language parser for probabilistic context
free grammar based on Earley algorithm.
- Pruning techniques like agenda based best first strategy, left-cornering and inverted
index were used.
- The Parser was tested on sentences from Wall Street Journal.
Decision Tree language mode for English Letters
- Implemented decision-tree as a 4-gram language model with cross validation.
- Generated an encoding of English letters into bit-strings using mutual
information criterion. Here is a graphical
representation of the
encoding.
- The model used both restricted question set using bit-encoding and unrestricted questions based on Chou's algorithm.
- Entropy and Gini index were used as goodness criteria’s.
- Used Python for implementation.
Isolated Word recognizer based on fenonic baseforms
model
- Developed an isolated word recognizer for a small vocabulary system, based
on Hidden Markov Model.
- The word recognition system was trained on a sample training dataset using Baum-Welch algorithm.
- The model used fenonic baseforms for each word and a approx. 80 state HMM.
- Developed 2 different HMM models
- A 7-state HMM for silence.
- 2-state HMM for speech.
- Implemented the data processing part in Python and EM training code in
C.
Other projects:
- Web crawler in perl.
- TF-idf based vector model for Word Sense Disambiguation
- Spam classifier using tri-gram model
and Witten Bell smoothing.
2) Operating Systems and Parallel Programming
Sub graph isomorphism using Map reduce
- Given two input graph the decision problem of finding if one graph or its subgraph is isomorphic to another
graph was implemented.
- The algorithm was based on generating permutation matrices, as described
here.
- Parallelized the generation and testing of permutation matrices, using Hadoop Map Reduce framework.
Slab Allocator
- Implemented an object based slab allocator closely adapted in Linux kernel.
- The reference used for this project was Jeff Bonwick's Slab allocator.
- The various features include caching objects in initialized state, dynamic slab size selection, slab coloring, self-scaling hash, small object cache for meta data and minimum fragmentation(below12.5%).
Kernel Keyboard Logger
- Implemented a kernel module to log key strokes using kprobes, a mechanism to probe kernel functions.
- Required synchronization between interrupt and non-interrupt driven code. Read scan codes which were made accessible through a debugfs file.