Fall Semester 2005: September 8, 2005 - December 12, 2005
Out on:
September 12, 2005
Due by:
September 18, 2005 by 5:59 pm for full credit (11:59 pm for 10% off, hard deadline)
Collaboration:
None
Grading:
Packaging 10%, Style 10%, Performance 10%, Design 20%, Functionality 50%
The first assignment for
600.211: Unix Systems Programming
is mostly a warmup exercise.
You will have to (re-)learn the C language,
including some of the trickier features like function pointers.
You will also have to get familiar with
the gcc compiler,
basic file processing using the standard C library,
some Unix system calls for configuration and time,
and several pervasive Unix tools such as
tar, gzip, and make.
Convince us that you're in the right course!
Your task is to develop a small framework for empirically evaluating
the quality of various hash functions.
The set of hash functions is closed ("compiled in") so you won't have
to worry about accepting hash functions as input.
Here is what your program hashtest should do:
-" is given as the file
(that's a minus sign)
/usr/share/dict/words on OS X 10.3
/usr/share/lib/dict/words on SunOS 5.6
--size or -s
option, default to 1021
--performance
or -p
--collisions
or -c
--names or
-n
-p if no option given, otherwise
produce all requested formats
For example, invoking
./hashtest manifesto.txt -s 128 --nam -p
will use manifesto.txt in the current directory
as input, use a hash table with 128 entries,
and print results sorted by name followed by results sorted
by performance.
Note that this is just one particular (read: arbitrary) way we can go about evaluating hash functions; for certain applications, e.g. cryptography, other characteristics might be much more important.
Here are the hash function you must support, but you're free to add more hash functions beyond this set if you are interested. Please use the names given here to identify these hash functions in your output.
make and your Makefile.
There is not much use in you re-implementing hash functions, so just download the relevant code from the sources given and integrate it with your framework. Watch out for license problems though and give proper credit.
uint32_t hash(const void* data, size_t length)
or so as the "standard" signature for your hash functions.
Then declare and initialize an array of these and step through
it in some suitable way.
Adding a new hash function should only require (a) code for the
new function, (b) adding an entry to your array.
.h
file).
Note that some instrumentation code will only be needed for this
particular program, not for a general hash table; try to address
this somehow.
Also, for the purpose of just collecting statistics, it may not
be necessary to actually "store" the data we're hashing...
indent tool can be quite helpful to ensure that
your code is formatted in a consistent way...
man command is your friend! Use it liberally
while exploring Unix. Try man man for sure! Maybe
man 2 intro, man 3 intro, and
man fgets are interesting as well? If you feel
like learning a lot, read man gcc. :-)
Please turn in a
gzip
compressed
tarball
of your assignment (the extension should be .tar.gz).
The tarball should uncompress into a directory
cs221-assignment-1-login
with login replaced by your Unix login name
(so I would use cs211-assignment-1-phf);
uncompressing should not create any other files
in the current directory.
The tarball should contain no derived files whatsoever,
but allow building all derived files with
make.
We expect that your Makefile handles "the usual" targets
like clean and test aside from all
(which is the main way we will build your program).
Include a README file that briefly explains what the
program does and contains any other notes you want us to check out
before grading.
Include other "common" files such as INSTALL describing
how to install your tool, CREDITS to pay your respects
to the people whose code you're reusing, and LICENSE
to describe copyright and distribution terms if you feel like it.
You can look at any number of "famous" open source projects to see
what kind of structure is appropriate;
gif2png
is a relatively small example, but you don't need everything in
there.
Aside from your code, what you really need is a README
and a Makefile that works. :-)
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to C programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, etc. Simple, clean, readable code is what you
should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures; often this can
be judged by asking "How hard would it be to add feature X?"
and "How hard is it to replace algorithm X with algorithm Y?".
Functionality refers to your program being
able to do what it should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your program cannot be built you will get no points whatsoever.
If your program cannot be built using
make
we will take off 10%.
If your program cannot be build without warnings using
gcc -W -Wall -O -ansi -pedantic
we will take off 10% (except if you document a very good reason).
If your program fails miserably even once,
e.g. segfaults or runs forever,
we will take off 10%.
If you really want to impress us, add some kind of "cool" bonus
feature to your program.
We won't give you extra points, but we'll give you extra kudos. :-)
Be sure to point out any bonus features you have in your
README file.