The first project was mostly a warmup, this second project has a lot more meat on its proverbial bones. With that in mind, here’s a friendly piece of advice: Start as early as possible! If not on the actual programming, then at least on trying to understand all the pieces…
Your job is to build something sorta kinda like a web cache. It’s going to be pretty far from an actual web cache, but at least some of the pieces are the same. The basic job of your web cache is to read a URL from standard input, check if it has a cached copy of that URL, and if so write the cached copy to standard output. Simple in principle, but all the dirty details are below.
There are two parts to this project:
one in which you build a persistent key-value store,
and one in which you build the actual caching program.
The two parts are connected through the interface defined by the
header file posted on Piazza.
You’ll want to carefully read through that file first and ask plenty
of questions about it: If you don’t fully understand the API, you won’t
be able to complete either part.
Note that you’re expected to also develop (and submit) a useful testing
infrastructure for your project.
This will involve shell scripting as before to test the web cache as a
whole, but it will also involve writing unit tests for your
implementation of the persistent key-value store.
The biggest challenge for this project will be coming up with a good conceptual model for how your key-value store will work. There are a number of “moving parts” all of which have to work together properly for a decent solution. To complicate things further, you have to invent most of them yourself. Try to go for the simplest possible approach that won’t embarrass you later!
Let’s start with an example:
$ ./webber samplecache http://www.cs.jhu.edu/ 404 Not Found http://fake.url.com/ 200 OK <html>Fake HTML document</html> http://cs220.rocks/ 200 OK <html>Fake HTML document</html> $
webber program is started with the name of a “database” that holds
the content of the web cache,
samplecache in this case.
Once the program is started, it waits for a line of input spelling out
an URL to look up.
If the program doesn’t have a copy of that URL in its database, it prints
404 Not Found;
if the program does have a copy of the URL in its database, it prints
200 OK followed by the cached content (the two fake HTML documents in
the example above).
In either case, it then waits for another URL to look up.
As per usual
CTRL-D ends the program.
Note that all the output above was written to
Even though you might think of
404 Not Found as an “error message” at
first, it’s not in fact an error message from
Nothing went wrong in the program, it simply didn’t have the requested
So this is not an “error message” in the same sense that “can’t open
database” would be.
Here’s what a “real” error message (printed to
stderr would look like):
$ ./webber somewrongname error: can't open database $
All your error messages have to start on a new line with the string
error: but the rest of the error message is up to you; the only other
constraint is that an error message can only be a single line.
If a “real” error occurs, the program is supposed set
as an exit status eventually, otherwise it’s supposed to set
EXIT_SUCCESS as per usual.
Note that we provide a program
mksample.c for you that will use your
key-value store to create an example cache database.
You’ll probably want to write a similar program to create a larger, more
complex cache eventually, for testing purposes.
(Keep in mind that we’re grading for performance this time, so you will
definitely want a larger cache database to see how well your program
You’ll implement another key-value store for this project.
However, in contrast to the
kvs homework assignment, this key-value
store will be persistent.
A creates a database
X, inserts a bunch of key-value
X, and then exits.
Later another program
B is started, opens the database
X, and prints
all the key-value pairs in it: program
B will output the stuff
A inserted earlier.
For this to work, the information has to be stored on disk in the form
of one (or more) files.
(Note that in the context of this project you can think of
A and of
webber.c as program
On Piazza you’ll find the
sdbm.h header file that describes
the interface to the key-value store.
Do not change
sdbm.h under any circumstances!
Make sure that you read the interface carefully and understand what
each operation is supposed to do before you start hacking!
Your job is to write
sdbm.c to implement all the operations from
You’ll have to decide how to store the data which will in turn determine
the code you’ll have to write to access/update it.
(We’ll discuss one implementation option in lecture, but it’s neither the
best one nor the simplest one. Thinking for yourself is highly encouraged.)
You should have a testing infrastructure in place for your project. Since there are two distinct parts, the key-value store and the web cache, there will be two distinct approaches to testing:
webberyou’ll use shell scripts as before to run system tests on the web cache application as a whole.
sdbm.cyou’ll run unit tests to ensure that each operation of the
sdbm.hinterface works as expected.
We’ll discuss two approaches to unit testing in lecture, one based on
“hand written” test drivers that use C’s
assert macro and one based
ct unit testing framework.
It’ll be up to you to decide which approach you want to use: The
former is easier to get started with, the latter provides better
overall support for unit testing.
Another choice that will be up to you is whether you want to keep
valgrind to guard against memory-related bugs in your code
or if you prefer to use the more general “sanitizers” available in
gcc that we covered in lecture.
Finally, you’ll have to perform coverage analysis for your test
That is, you’ll have to use
gcov to determine what parts of your
code base (in
webber.c) your tests cases actually
Your goal is to get 100% line coverage for both files, but it may
not be realistic to achieve that; try to get as close as possible and
defend any missing coverage in your
staticto avoid exporting them!
stderrand not to
stdout. Your program’s exit status should reflect whether there was an error or not.
sdbm.cshould only perform input/output operations on files it opened, not from/to
stderr. The key-value store is a library that could be used to many different contexts, not just in this web caching application. If it was being used in a GUI or mobile application, there would not be a
stderrto print to! Libraries don’t make user interface decisions, ever.
sdbm.his supposed to do, ask about it before you make a bad assumption.
kvs.cfrom the homework assignment, adapt it to
sdbm.hand then just add enough code to load the entire thing in
sdbm_openand save the entire thing in
sdbm_close. However, that certainly won’t get full performance credit…
sdbm_openyou’re going to waste a lot of space. It might be okay to load all the keys because you can use that to ensure fast access, but loading all the values is a really bad idea.
bsearchfrom the standard library. You can use them to sort an array and perform binary search operations, however you’re on your own for the details.
All your core C code for the web cache should be in
Please don’t write additional modules that complicate how your program must
Please follow the submission instructions as detailed on
Make sure that your tarball
contains no derived files whatsoever (i.e. no executable files), but
allows building all required derived files.
Also, be sure to include a
Makefile that sets the appropriate
compiler flags and builds all programs by default.
Makefile should also have
test targets as per usual;
test target should run both system and unit tests; ideally it also
runs coverage analysis for you.
Finally, make sure to include your name and email address in every file
you turn in (well, in every file for which it makes sense to do so anyway)!
For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.
Style refers to C programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for. Make sure you follow the style guide posted on Piazza!
Design refers to proper modularization (functions, modules, etc.) and an appropriate choice of algorithms and data structures.
Performance refers to how fast/with how little memory your programs can produce the required results compared to other submissions.
Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous, ask for clarification! (It also refers to you simply doing the required work, which may not be programming alone.)
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using the required
compiler options given on Piazza we will take off 10%
(except if you document a very good reason).
If your programs cannot be built using
make we will take off 10%.
If your programs fail miserably even once, i.e. terminate with an
exception of any kind or dump core, we will take off 10% (for each such