Project 1: Magic Hexdumps

Overview

This first project tests your basic C programming skills. You’ll also learn what’s actually in a file, an experience that will hopefully demystify the concept further.

You should start by reading up on what “hexdumps” are and what they’re used for. For this project, you will write your own hex program that produces a hexdump on standard output for data read from standard input.

Note that you’re expected to develop a basic testing infrastructure using shell scripting in addition to the actual program. (Also note that the existing xxd tool provides a useful test oracle for your program, see below.)

If you have real trouble with this project, you’ll likely have even more trouble with future projects.

The Program

Let’s start with an example:

$ ./hex
Hello
00000000: 48 65 6c 6c 6f 0a                                Hello.

The program was started, then the user typed the word “Hello” followed by RETURN, then CTRL-D was used to stop the input. The result shows the ASCII code for each character (in hexadecimal, so it’s guaranteed to be two digits wide for each byte), including the newline character generated by the RETURN key. The formatting may look a bit strange, but the purpose of the large gap becomes apparent if we examine a longer input:

$ ./hex
This is a longer example of a hexdump. Marvel at it's magnificence.
00000000: 54 68 69 73 20 69 73 20 61 20 6c 6f 6e 67 65 72  This is a longer
00000010: 20 65 78 61 6d 70 6c 65 20 6f 66 20 61 20 68 65   example of a he
00000020: 78 64 75 6d 70 2e 20 4d 61 72 76 65 6c 20 61 74  xdump. Marvel at
00000030: 20 69 74 27 73 20 6d 61 67 6e 69 66 69 63 65 6e   it's magnificen
00000040: 63 65 2e 0a                                      ce..

This time the user entered two sentences, then signaled end of input with CTRL-D. Again, we see the ASCII code for each character (including spaces and newlines). The formatting is set up so that regardless of the number of characters, we always have three “columns” of output:

  1. First the overall “position” in the input. Note that this is also a hexadecimal number, formatted to 8 digits.
  2. Then the ASCII values for each character in hexadecimal, at most 16 to a line.
  3. Finally a string-like representation of the data, with printable characters shown but non-printable characters (like newline or tab) replaced with a dot character.

Note that there’s a single space between the colon after the offset and the ASCII values, but there are two spaces between the ASCII values and the string-like representation.

Testing

You should have a testing infrastructure in place for your program. That is, there should be a shell script testhex.sh that runs your hex program on a set of input files, producing a set of output files. The script should use diff to ensure that the output matches some expected reference output. The script should only print something when a test case fails. It should be possible to run all the test cases using make test alone; a prior make to build the program should not be needed.

Hints

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also, be sure to include a Makefile that sets the appropriate compiler flags and builds all programs by default. Finally, make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Grading

For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to C programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for. Make sure you follow the style guide posted on Piazza!

Design refers to proper modularization (functions, modules, etc.) and an appropriate choice of algorithms and data structures.

Performance refers to how fast/with how little memory your programs can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous, ask for clarification! (It also refers to you simply doing the required work, which may not be programming alone.)

If your programs cannot be built you will get no points whatsoever. If your programs cannot be built without warnings using the required compiler options given on Piazza we will take off 10% (except if you document a very good reason). If your programs cannot be built using make we will take off 10%. If your programs fail miserably even once, i.e. terminate with an exception of any kind or dump core, we will take off 10% (for each such case).