Assignment 0: Warmup

Overview

This assignment is mostly a warmup exercise giving you a chance to review your C programming skills by writing two simple programs. You’ll also learn something useful about what’s actually in a file, that’s going to be very handy for Assignment 3.

Both programs are very easy to test because you can use existing UNIX tools to compare your output against. If you have trouble with this assignment, you’ll likely have even more trouble with future assignments.

Problem 1: Counting Words (40%)

This problem asks you to implement a simplified version of the UNIX tool wc, the so-called “word count” program. Start by reading the manual page for wc to remind yourself what the program does. Note, in particular, the definition of “word” used in the manual page:

A word is a non-zero-length sequence of characters delimited by white space.

Obviously the notion of “white space” is important for this program, but we’re in luck, there’s a function for checking whether a character counts as white space already: isspace.

So here’s how your version of wc should work: Read input character-by-character from standard input; maintain three counters, one for the total number of characters, one for the total number of words, one for the total number of lines; when EOF is encountered, print the number of lines, words, and characters to standard output in the format detailed below and end the program.

Here are a few examples of how wc should behave:

$ ./wc
0 0 0
$ ./wc
Peter was here.
1 3 16
$ ./wc
Here is
some more input
     for you to
     play with.
4 10 56

In all of these examples, there are no spaces after the last character on a line (but there’s a line feed character there of course). Note that for the empty input, you should produce all zeros as output. The numbers are separated by a single space. Obviously your source code should be in a file called wc.c.

Hints

Problem 2: Hexdumps (60%)

Start by reading up on what hexdumps are. For this problem, you will write a program hex.c that produces a hexdump on standard output for data read from standard input. Let’s start with an example:

$ ./hex
Hello
00000000: 48 65 6c 6c 6f 0a                                Hello.

The program was started, then the user typed the word “Hello” followed by return/enter, then CTRL-D was used to stop the input. The result shows the ASCII code for each character (in hexadecimal, so it’s guaranteed to be two digits wide for each character), including the newline character generated by the return/enter key. The formatting may look a bit strange, but the purpose of the large gap becomes apparent if we examine a longer input:

$ ./hex
This is a longer example of a hexdump. Marvel at it's magnificence.
00000000: 54 68 69 73 20 69 73 20 61 20 6c 6f 6e 67 65 72  This is a longer
00000010: 20 65 78 61 6d 70 6c 65 20 6f 66 20 61 20 68 65   example of a he
00000020: 78 64 75 6d 70 2e 20 4d 61 72 76 65 6c 20 61 74  xdump. Marvel at
00000030: 20 69 74 27 73 20 6d 61 67 6e 69 66 69 63 65 6e   it's magnificen
00000040: 63 65 2e 0a                                      ce..

This time the user entered two sentences, then signaled end of input with CTRL-D. Again, we see the ASCII code for each character (including spaces and newlines). The formatting is set up so that regardless of the number of characters, we always have three “columns” of output:

  1. First the overall “position” in the input. Note that this is also a hexadecimal number, formatted to 8 digits. (This used to say 7 but our reference program xxd has apparently been updated to use 8 recently.)
  2. Then the ASCII values for each character in hexadecimal, at most 16 to a line.
  3. Finally a string-like representation of the data, with printable characters shown but non-printable characters (like newline or tab) replaced with a dot.

Note that there’s a single space between the colon after the offset and the ASCII values, but there are two spaces between the ASCII values and the string-like representation.

On Piazza you’ll find some starter code for this program. You can of course ignore the starter code and write the entire thing from scratch yourself, but we recommend you use the starter code: It contains a few important hints that you may not want to live without. Good luck!

Hints

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also, be sure to include a Makefile that sets the appropriate compiler flags and builds all programs by default. Finally, make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Grading

For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to C programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for. Make sure you follow the style guide posted on Piazza!

Design refers to proper modularization (functions, modules, etc.) and an appropriate choice of algorithms and data structures.

Performance refers to how fast/with how little memory your programs can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous, ask for clarification! (It also refers to you simply doing the required work, which may not be programming alone.)

If your programs cannot be built you will get no points whatsoever. If your programs cannot be built without warnings using the required compiler options given on Piazza we will take off 10% (except if you document a very good reason). If your programs cannot be built using make we will take off 10%. If valgrind detects memory errors in your programs, we will take off 10%. If your programs fail miserably even once, i.e. terminate with an exception of any kind or dump core, we will take off 10% (for each such case).