Homework 1: The grep program

Homework is important practice for you, but it’s not graded so you don’t have to submit it. This particular homework is designed to help you review basic C programming, independent compilation and linking, as well as the make tool.


You will develop a clone of the UNIX grep command. No worries, it’s a simplified clone that doesn’t have to support all the features of the actual grep program; in fact it barely supports one feature. Nevertheless, you should read the man page for grep to see what the real program can do.

The Program

Your grep program reads lines (terminated by a newline character) from standard input. It checks, for each line, whether a substring given as a command line argument is contained in the line; if so, the program prints the line to standard output; if not, it moves on to reading the next line. The grep program stops once the input is over. Here are three examples for how this works:

$ ./grep 600.229
$ ./grep 600.229
You need to know C
to succeed in 600.229
to succeed in 600.229
or so they say.
$ ./grep "hi there"
hi there
hi there
there hi

First I started the program and then immediately hit CTRL-D to signal end-of-file; there is no output whatsoever.

Next I started the program and typed three lines (hitting RETURN three times in the process) before hitting CTRL-D; the second time you see “to succeed in 600.229” it’s not me typing the line, it’s grep printing the line because it contains the substring 600.229 we’re searching for.

Finally I started the program with a string in quote marks as the command line argument; you should be able to figure out what this does from the output we’re getting here; the second “hi there” line is not me typing.

Exit Status

The grep program indicates whether it was successful or not using the exit status of the main function. Here’s an example:

$ ./grep yes
yes indeed
yes indeed
$ echo $?
$ ./grep nope
sadly no
$ echo $?

So if grep encounters the substring we’re looking for, it exits with the EXIT_SUCCESS status, if it doesn’t find the substring, it exits with the EXIT_FAILURE status. Sadly grep cannot use EXIT_FAILURE to indicate errors, hence you should use the non-standard exit status 2 for that purpose. (Check the man page for grep if you want to make sure that’s how it works.)


You may wonder what could possibly go wrong in grep since it only reads from standard input and only writes to standard output. One (somewhat far-fetched!) example would be that we run grep bla <data >hits to read from a file data and to write matching lines to a file hits, but that as we write the output our disk fills up.

There is, however, also a more obvious source for errors: The length of a “line” is not limited in any way. Hence, if you use a fixed-size buffer, there will be inputs you cannot properly process. However, if you allocate (and reallocate) the buffer for a line dynamically, you may run out of memory, also resulting in an error.

Here’s one more example error that your grep should detect:

$ ./grep
error: no pattern specified
$ echo $?

Your code should detect errors and stop the program gracefully, that is without crashing, with an informative error message printed to standard error, and with the proper exit status set (that being 2 in the case of grep, see above).



At minimum, you should have a file grep.c with your actual program code and a Makefile that builds the grep program with the correct compiler options (see Piazza). The program should compile without warnings and produce no output other than what is described above. Also, there should probably be more than one function in grep.c!

If you produce a version that can process lines of arbitrary length, you should have lineread.h and lineread.c as well (the name doesn’t really matter, the abstraction does). This will require a slightly more complicated Makefile to build correctly.

Speaking of the Makefile, ideally you have clean and test targets that clean up temporary files and run all test cases respectively. Details of how to test grep are up to you, but one approach would be the use of simple shell scripts.