Spring Semester 2008

January 28, 2008 – May 2, 2008

Assignment 2: Up and Running?

Sample solution by Rebecca Shapiro, Spring 2008.

Problem 1: Building from Tarballs and Reading Code (40%)

Exploring this Bayesian spam filter as an example for a bigger application written in C.

bmf.c: This code defines the entry point to the bmf program- the code that will be run when bmf is executed. It reads in arguments from the command line and initializes things so that it can work. It also contains logic to make calls to obtain spam levels and write to the database.

bmfconv.c: This code implements an entry point to the a program that converts spam databases created by bmf to different types of database backends it supports. It also allows you to import into an existing database or export a database into another format. It does checking to see that the databases are valid and then calls external functions to do the actual processing.

dbdb.c: This code implements the dbh interface using a DBDB database (see dhb.c).

dbg.c: This implements code to made debugging easier and the program more verbose to aid in development.

dbh.c: Defines an interface for database handling. All database types implement the methods defined by dbh so that the type of backend database is invisible to the spam analysis code.

dbmysql.c: This provides MySQL functionality in implementing the dbh database interface.

dbtext.c: This implements dbh interface functionality using a text file based database.

filt.c: Functions to determine whether something is spam or not. Implements Bayes filter.

lex.c: This file implements a bunch of tools that can parse an email. It defines what are considered "tokens" in email and allows the user of the interface to scan through the email one meaningful chunk at a time.

str.c: Various string utilities for string manipulation.

vec.c: Implements vectors to hold strings. Also contains utility functions such as sort, add, find, equals etc.

Problem 2: Formatting Text (60%)

You are to implement a highly simplified version of the Unix command fmt that breaks text from standard input into lines of a certain width.

For a sample solution, see code here: fmt.c