Assignment 4: All your Data in our Base!
Out on:
February 18, 2008
Due by:
February 25, 2008, 3:00 pm (before lecture)
Collaboration:
None
Grading:
Packaging 10%, Design 10%, Style 20%, Functionality 60%
Overview
The fourth assignment asks you to implement a very simple database system that can store keys and their associated values persistently on disk. It is the most programming-intensive assignment so far, and I highly recommend that you start working on it as soon as you can. You won't be able to finish it if you wait until the last day before it's due!
The Interface
This is where you can get the latest version of the interface you need. Please watch the mailing list carefully for changes to this!
-
Latest version:
sdbm.h
with
md5sum 69801a25ec93880f8cef3107e014d777
The Examples
This is where you can get some simple example programs using the interface. Feel free to ask questions about these on the mailing list!
Problem 1: Interface Critique (20%)
The simple database you are implementing needs to conform to the well-defined interface provided above. This problem asks you to write a short critique of that interface.
It's probably easier to do this problem after you have already tried to implement the interface. Are there operations we missed? Could we have made the interface simpler to use or implement? Do you think the way the interface is defined could be improved? Does the interface design cause performance problems? These are the kinds of questions you want to address, but you will certainly be able to think of more.
If you have a better way to define the interface, make sure to
include a draft of your suggested better_sdbm.h file.
Problem 2: Database Implementation (80%)
The simple database you are implementing needs to conform to the well-defined provided above. Do not change the interface for any reason! If an error needs to be corrected, we will post a new version of the interface with a new checksum!
It's easy to describe your task: Implement the interface! :-) Of course there are lots of decisions you'll need to make in order to do that, most importantly how you will actually store the database on disk. A relatively simple and popular way is to use two files:
- One file contains all the keys, the other file contains all the values.
- Inside the key file you also store the position and length of the corresponding value in the value file.
- When the database is opened, you read the key file into memory and build a simple data structure for it (often called the "index" of the database).
- When you're asked to get a key, you look it up in your index data structure; if you find it, you open the value file, move to the right position, read the value, close the file, and finally return the value.
- You proceed in a similar manner for all other operations.
We just included this description as an example, you can of course choose a completely different way to organize things. The whole point of defining an interface is that clients of the database don't have to worry about how the database works internally, so in that sense implementation details are not really important. What is important is that you actually store data on disk and not just in memory: If one program writes a new key/value pair to the database and exits, and another program is started that looks for the same key, it should find the value stored there by the previous program.
Please put your implementation into a file sdbm.c
and make sure that sdbm.o builds by itself so it
can be linked to the separately compiled test programs.
Hints
- It is possible that there are errors in the interface and that we need to change the interface over the course of the assignment. It's in your interest to get started and ask questions about the interface as soon as possible; it's also in your interest to keep track of the mailing list every single day.
- For those who took data structures already: Do not worry too much about efficiency at this point! If you have no issues whatsoever with your C coding, feel free to go ahead and implement some fancy tree structure to make things fast. But it's probably a better idea to keep things very simple and instead concentrate on making your code clean, easy to understand, and most of all correct.
- For the curious: Peter's current implementation uses the database name as a directory and keeps each key/value pair in a separate file under that directory. But that's probably not the best way to go either. :-)
- Technically you have to worry about what happens when two programs access the database concurrently. However, this turns out to be quite involved to get right, so you can simply assume that it won't happen for now.
Deliverables
Please turn in a
gzip
compressed
tarball
of your assignment;
the filename should be
cs120-assign-4-login.tar.gz
with login replaced by your Unix login name
on ugradx.cs.jhu.edu
(so I would use cs120-assign-4-phf.tar.gz).
The tarball should contain no derived files whatsoever
(i.e. no executable files),
but allow building all derived files.
Include a README file that briefly explains what your
programs do and contains any other notes you want us to check out
before grading.
Grading
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to C programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, suitable documentation, etc.
Simple, clean, readable code is what you should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures.
Functionality refers to your programs being
able to do what they should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using
gcc -ansi -pedantic -Wall -Wextra -std=c99 -O
we will take off 10% (except if you document a very good reason).
If your programs cannot be built using make we will
take off 10%.
If your programs fail miserably even once,
i.e. terminate with an exception of any kind or dump core,
we will take off 10%.
Finally, make sure to include your name and email address in
every file you turn in!