Assignment 2: Up and Running?
Out on:
February 4, 2008
Due by:
February 11, 2008, 3:00 pm (before lecture)
Collaboration:
None
Grading:
Packaging 10%, Design 10%, Style 10%, Functionality 70%
Overview
The second assignment shows you how to build more complex Unix applications from tarballs and asks you to do some more complex programming on your own.
Problem 1: Building from Tarballs and Reading Code (40%)
You are going to explore
this Bayesian spam filter
as an example for a bigger application written in C.
Grab the
tarball
for version 0.9.4 using wget (for example)
and extract it using tar and gzip.
On a Linux system, you should be able to give the
./configure command followed by the
make all command
and see the program get built from
all the separate source files.
The result is an executable called bmf which, when
run, tries to analyze an email it reads from standard input and
tries to tag it as either spam or ham.
How it does that in detail is actually not that important, but
of course you're free to read around in the program. Actually,
you must read it to do this problem. :-)
Check out all the source code files and for each write a short summary of what the purpose of that file is. It is not necessary for you to really understand every single line of code, rather you should concentrate on getting the "big picture" of how the program is structured. It's probably enough to write an average of two–three sentences for each file; some will only require one sentence, some will require four–five. Finally, write a short paragraph that summarizes your general impression of the code: Did you like reading it, was it horrible, if so why, what needs to be changed, etc.
Make sure you use tools such as ctags
and cscope to your advantage, they make it
much easier to navigate around in unknown code.
Problem 2: Formatting Text (60%)
You are to implement a highly simplified
version of the Unix command fmt that breaks
text from standard input into lines of a certain width.
Consider, for example, the following input text:
Here is some text for you. Use it wisely to find out how fmt works and so on and so forth. There are a few things to keep in mind. For example about indented text and what it means For example about more indented text and what it means And so on, and so forth.
Feel free to try out what the real
fmt program does with this, but it's
not important for this problem.
Your version of fmt should
produce the following output given the above input:
Here is some text for you. Use it wisely to find out how fmt works and so on and so forth. There are a few things to keep in mind. For example about indented text and what it means For example about more indented text and what it means And so on, and so forth.
In other words, you break the input text (from standard input) into lines of at most 30 columns and write the result back out (to standard output). You ignore white space in the input except for using it to decide where words start and end; you never break words apart, ever. Words that are longer than 30 columns by themselves are copied to the output on a line by themselves, but in their complete length.
Just for reference, here are two more examples. From
As you can see we don't preserve any spacing either.
we get
As you can see we don't preserve any spacing either.
and from
sdjafh adlkfjhsadfkj asdfkj asddkjfh asdkdhf kasdhf kjasdhf lasdf asdfjkhasdfjkasd dfas dfalsd fkasjhdf jashdf ksadhf asfl sdhfjkahsdf jasdhfkj asfkj askfhsdk fjhaksljfh aksjdhf jasdhf ka. sadkjfhasdkljfhasdkjfhaskldhfkjasdhfkljasdhfklajsdhf
we get
sdjafh adlkfjhsadfkj asdfkj asddkjfh asdkdhf kasdhf kjasdhf lasdf asdfjkhasdfjkasd dfas dfalsd fkasjhdf jashdf ksadhf asfl sdhfjkahsdf jasdhfkj asfkj askfhsdk fjhaksljfh aksjdhf jasdhf ka. sadkjfhasdkljfhasdkjfhaskldhfkjasdhfkljasdhfklajsdhf
Make sure you modularize your program into sensible functions! Each function should be relatively small and should perform a single, cohesive part of the overall program. Enjoy! :-)
Deliverables
Please turn in a
gzip
compressed
tarball
of your assignment;
the filename should be
cs120-assign-2-login.tar.gz
with login replaced by your Unix login name
on ugradx.cs.jhu.edu
(so I would use cs120-assign-2-phf.tar.gz).
The tarball should contain no derived files whatsoever
(i.e. no executable files),
but allow building all derived files.
Include a README file that briefly explains what your
programs do and contains any other notes you want us to check out
before grading.
Grading
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to C programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, suitable documentation, etc.
Simple, clean, readable code is what you should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures.
Functionality refers to your programs being
able to do what they should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using
gcc -ansi -pedantic -Wall -Wextra -std=c99 -O
we will take off 10% (except if you document a very good reason).
If your programs cannot be built using make we will
take off 10%.
If your programs fail miserably even once,
i.e. terminate with an exception of any kind or dump core,
we will take off 10%.
Finally, make sure to include your name and email address in
every file you turn in!