Goals
In this assignment you will learn about C arrays, C strings and how to make your program interact with the Unix (e.g. Linux but also MacOS) command line. You will then write a simple sorting program. Finally, you will learn some useful strategies for testing code.
Language concepts covered this week
-
arrays
-
strings
-
command-line argument processing
Other concepts covered this week
-
Makefile
s again -
Using
assert
for debugging -
Test scripts again
Reading
Please read this page on command-line argument processing.
Arrays in C
Arrays in C are declared as follows:
int foo[10]; /* Declares an array called 'foo' with space for ten ints. */
and accessed as follows:
int i;
...
i = foo[4]; /* This gets the element from 'foo' at index 4. */
Note that arrays in C start at element 0, not element 1. Therefore, in this
case, the last element in the array would be foo[9]
. Note also that array
elements are not initialized to be anything, so they should be assumed to hold
garbage (arbitrary values) until you assign a value to them. Also, you need to
realize that if you try to access an element "off the end" of the array (e.g.
the 100th element of the ten-element array foo
in the code above), you will
get no compiler warnings, but the program will probably crash when it runs.
This is part of C’s no-error-checking I-assume-you-know-what-you-want
philosophy of programming.
Arrays can also be two-dimensional, three-dimensional, etc. The syntax is analogous:
int foo[10][5];
int i;
...
i = foo[4][2];
In addition, arrays can be initialized when they are declared:
int foo[5] = { 1, 2, 3, 4, 5 };
int bar[2][3] =
{
{ 1, 2, 3 },
{ 4, 5, 6 }
};
You will need to use array initialization in the next assignment. Finally, you can pass arrays to a function like you would pass a normal variable:
void munge(int array[])
{
/* code that uses the values in array[] */
}
/* more code... */
int main(void)
{
int stuff[10];
/* more code... */
/* Pass the 'stuff' array to the function 'munge'. */
munge(stuff);
/* {etc} */
}
However, when you do this, you should be aware that you are not passing a copy of the array to the function but the array itself. That means that if you modify the array in the function it will remain modified when you return from the function. This will be useful in the assignment below. The reason for this behavior is due to the fact that C arrays are actually represented as pointers; we’ll cover this in later lectures and assignments.
Program to write
Write a program called sorter
which behaves as follows:
-
If there are no command-line arguments at all when the program is run, the program should print out instructions on its use (a "usage message"; see this page). There should only be one usage message, and it must follow the standard conventions (see the link). Note that the optional command-line arguments (see below) must be included as part of the usage message.
-
The program will be able to accept up to 32 numbers (integers) on the command line.
-
If there are more than 32 numbers on the command line, or no numbers at all, the program should print out the usage message and exit.
-
If the optional command-line arguments
-b
or-q
are found anywhere in the command line, change the behavior of the program as described below. -
If any of the command-line arguments to the program are not integers or one of the two optional command-line arguments, your program’s response is undefined — it can do anything. (i.e. you shouldn’t worry about having to handle anything but integer arguments or the two command-line options). The way to deal with this is as follows: for each argument, first check to see if it’s one of the command-line options. If so, proceed accordingly. If not, assume it represents an integer and convert it using the
atoi()
function (see below). -
Sort the numbers using either the minimum element sort or the bubble sort algorithm (see below). Do not use a global array to hold the integers; use a locally-defined array in
main
and pass the array to the sorting function. Define separate functions for both sorting algorithms. Useassert
(see below) to check your sorting function for correctness. -
Print out the numbers from smallest to largest, one per line.
The Sample Output
If the above doesn’t make sense, this is what your program should look like
(bold is what you would type, and $
is the Unix prompt; yours may be
different):
$ sorter 5 9 -2 150 -95 23 2 5 80 -95 -2 2 5 5 9 23 80 150 $ sorter usage: sorter [-b] [-q] number1 [number2 ...] (maximum 32 numbers)
Command-line arguments
Your program begins in the main
function, which up until now has looked like
this:
int main(void)
{
/* your code here */
return 0;
}
but will now look like this:
int main(int argc, char *argv[])
{
/* your code here */
return 0;
}
So let’s take the first line apart. There are a few parts to this:
int
-
Declares that this function returns an integer.
main
-
Declares this function’s name,
main
. Recall that C programs always begin executing in themain
function. int argc
-
argc
is equal to the number of elements ofargv
. char *argv[]
-
Okay, this one’s a bit trickier. The second argument,
argv
, is an array of "char *
"s.char *
is C’s way of handling character strings, which are represented as arrays of characters where the last character is ASCII character 0 (often written as'\0'
and sometimes called the "nul" character [1]). This will make more sense when we discuss pointers.argv
contains one string for each of the command line arguments that your program is run with. More on this below.
To give an example of this, let’s take the command line from the example above:
$ sorter 5 9 -2 150 -95 23 2 5 80
This would produce the following values in argc
and
argv
:
argc 10 argv[0] "sorter" argv[1] "5" argv[2] "9" argv[3] "-2" argv[4] "150" argv[5] "-95" argv[6] "23" argv[7] "2" argv[8] "5" argv[9] "80"
Note that argv[0]
holds the name of your program, and that the first
user-supplied argument is in argv[1]
. Remember this! (One thing this
implies is that argc
is 1 greater than the number of user-supplied
arguments.) Notice also that a command-line argument of 5
is not an
integer; it’s a string that can be converted into an integer. Which leads us
to the next topic.
Converting strings to integers
Okay, so how do we turn these argv
strings into integer values? Well,
there’s this handy function called atoi
("ascii to integer"). For
example, atoi("5")
equals 5. In order to use atoi
, you need to put the
following line at the top of your program:
#include <stdlib.h>
atoi
is a pretty dumb function; if you pass it a bogus value it’ll just
return 0 instead of signalling any kind of error. That means that you need to
check for the optional command-line arguments -b
and -q
before trying to
convert a command-line argument to an int
.
The minimum element sort algorithm
Alright, so what is this "minimum element sort" algorithm? The basic idea is that the smallest element in the array will be the zeroth element in the sorted array, the second-smallest will be the first element, etc. Here’s how it works:
Let array
be the array of integers, and num_elements
be the total number of
elements in array
.
-
Start with
start = 0
(for the index of the zeroth element). -
Set
smallest = start
(smallest stores the index of the smallest element encountered so far). -
Run through a loop with the variable
index
going fromstart
tonum_elements
-
If
array[index] < array[smallest]
, setsmallest = index
. -
Once the loop ends, swap
array[start]
andarray[smallest]
(moving the smallest element found to the beginning of the array you searched). -
Increment
start
, and ifstart < num_elements
, go back to step 2. Do not use agoto
statement, however; use another loop. -
If
start >= num_elements
then you’re done and the array is sorted.
Adding command-line options
You will add the ability to process two optional command-line arguments
(usually called "command-line options") to your program. These options will be
-b
and -q
. To test for these, you’ll need to be able to compare strings.
The function strcmp(str1, str2)
returns 0
if str1
and str2
are the
same and nonzero otherwise. So to check if the first argument is -b
, you
would have to do something like:
if (strcmp(argv[1],"-b") == 0)
{
/* put stuff here */
}
To use the strcmp
function, you need to put the following line with the other
#include
s at the top of your file:
#include <string.h>
WARNING! Do not do this:
if (argv[1] == "-b") /* WRONG! */
{
/* put stuff here */
}
This doesn’t work; you can’t compare strings using the ==
operator.
[2] The reason
for this will be made clear when we talk about pointers.
You should add information on any command-line options you allow to the usage information that is printed out when there are no arguments. If you don’t, you will lose marks.
If the user supplies one of the command-line options, you must not treat it
as part of the list of integers. Also, you must not assume that the
command-line options are going to be entered before the numbers; it should be
possible to enter them anywhere after the program name (and the test script
will check for this). Finally, it’s legal to enter the command-line options
more than once, although this won’t make any difference after the first time.
You don’t need complicated code to achieve this! You can process all of the
command-line arguments in a single pass through the argv
array. Many
students write absurdly complicated code to handle the command-line arguments,
and it’s just a waste of effort. In other words, it’s much easier than you
probably think it is.
One thing you do have to make sure of is that you aren’t adding values into the array of numbers at indices that are too large. This array can only be 32 elements long at most, so if you try to assign to indices 32 or greater, it’s an error even if it seems to work properly.
It should be clear why indices greater than 32 are off-limits, but why do you think it’s illegal to assign to index 32 exactly? |
To keep things simple, you are only required to handle integers or the two
specific command-line options -b
and -q
, and any command-line argument (not
counting the program name) that isn’t -b
or -q
can be assumed to be an
integer.
Here are the command-line options and what they mean:
-
The
-b
option means that you should sort using a "bubble sort" instead of a minimum element sort algorithm.
Bubble sort, like minimum element sort, is not a very efficient algorithm.
Much more efficient algorithms (such as quicksort) exist, but they are best
left until after you’ve had some experience with recursion, which is coming up
next week . Also, the C standard library has a qsort
(quicksort)
library function, but to use it you need to understand function pointers, which
we won’t cover in this track (although it’s not that hard).
-
The
-q
option suppresses the output (i.e. nothing gets printed). Why would we want to do this? See the next section.
Information on different sorting algorithms, including bubble sort, can be found all over the web, in algorithms textbooks, etc. The Wikipedia page on bubble sort is a good reference.
More on command-line options
Here are some guidelines (which we expect you to follow) for implementing the command-line option processing in this program. They will make it much easier to get a working program that passes the test script.
-
First off, do not change the elements in the
argv
array. There is no need to. Some people try to shuffle values in theargv
array, as if you need to passargv
to the sorting functions. You don’t. What you need to do is to create a brand new array that can hold all the numbers in theargv
array. Altering theargv
array is legal, but it’s poor programming style. -
Second, this new array has to be large enough to hold all the numbers on the command line, and no larger. Given what you all know now, this means that it should be 32 elements long. You may not use all of them (that’s fine), but they will be there in case someone enters a command line with 32 numbers. Later, I’ll show you how to create arrays that have exactly the right number of elements, based on the input. Also, you aren’t allowed to do this:
int n = 10;
int array[n]; /* Invalid */
This isn’t legal C [3], so if you do this, you’ll get a compiler warning, and our policy is to not allow programs that generate compiler warnings. In addition, you can’t just use a humongous array, like this:
int array[1000];
This kind of thing is just programming to make the test script work, usually for the wrong reason. So 32 is as big an array as we’ll allow.
-
Third, you have to check if there are more than 32 numbers in the command line, which means 32 values that are neither the program name (
argv[0]
) nor-b
nor-q
. If there are more than 32 numbers, the program should exit with a usage message and return a non-zero value frommain()
. The test script will check for this. In addition, make sure that your program doesn’t at any point add an element to the numbers array at an invalid index i.e. an index larger than 31. Even if the program passes the test script, this is a serious error, because you’re writing into memory that isn’t part of the array. -
Fourth, the
-q
or-b
arguments can occur anywhere on the command line except inargv[0]
(you should know why they can’t occur inargv[0]
), and there can even be more than one-b
or-q
arguments in the command line. Multiple-b
or-q
arguments don’t do anything beyond what single ones do. The test script checks for this too. -
Finally, despite all these guidelines, this is actually a pretty easy problem; you can do everything in a single
for
loop. If your solution is very complicated, you are almost certainly doing something wrong. If that’s the case, don’t be shy about asking for help from the TAs or the instructor!
Avoiding magic numbers
A "magic number" is a number that is just plopped into a program with no context, especially if a number like this is repeated several times in a program. Usually, a magic number is some kind of unidentified parameter of the program.
In this assignment, the number 32
hard-coded into the program would be a
magic number. Magic numbers are almost always bad, for two reasons:
-
The significance of the number is unknown, or must be inferred from the code it’s embedded in. That makes the code harder to understand.
-
If you decide to change it to a different value, you have to change it in multiple places, and it’s easy to forget one, leading to hard-to-find bugs.
It’s extremely easy to avoid using magic numbers; just use the #define
preprocessor instruction to define a meaningful symbolic name for the magic
number, and only use the symbolic name in the program. That way, if you want
to change the number’s value, you only have to do it in one place (by changing
the #define
statement).
We will take marks off for using magic numbers in this assignment and in all
subsequent assignments, so make sure you don’t use them. Specifically, the
number "32" should only occur once in your entire program, inside a #define
statement!
Commenting
Don’t forget to write good comments! From now on, we’ll expect your commenting to be top-notch, unless (as in some later assignments) we supply you with a code template that already has the comments (in which case we’ll expect that you’ve read and understood the comments, and that your code is consistent with them).
Specifically, we want to see:
-
A comment at the top of the source code file explaining what the program does.
-
A comment before each function (except for
main
) which explains what the function does, what its arguments mean, and what its return value represents (if it has a return value). -
Comments inside functions explaining how the code works, unless it’s totally obvious. For this assignment, for instance, you’ll want to explain how your sorting functions work. It’s better to put this inside the function than before the function, because knowledge of how it works isn’t usually important for using the function (i.e. it’s not part of the function’s interface, just its implementation).
You may think this is too much work; it isn’t. The amount of time it takes you to write good comments is usually only a fraction of the time it takes you to write good code. In fact, many programmers advocate writing comments before writing code; the comments then serve as a kind of "scaffolding" around which you build your program. Writing the comments first forces you to think clearly about what you want to achieve, instead of just randomly writing code and stopping when it seems to work.
Testing your program
Many programmers can write working code, but fewer programmers test their code systematically to see if it actually does work. It’s important to learn different strategies for doing this, because it can not only improve the quality of your code, but also the quality of your life. If you don’t believe me, that’s because you haven’t spent enough four-hour+ sessions trying to track down a single bug caused by a typo somewhere in your program. [4] What we want to do here is see how we can make it happen less often.
The joy of assert
Using assert
is probably the single easiest way to improve the quality of
your code. The idea behind assert
is to make your program self-checking.
Let’s say you’ve just coded up an incredibly hairy algorithm which will balance
the budget, cure cancer, send mankind to the stars and do other wonderful
things. How do you know that your code doesn’t have a bug? And if it does
have a bug, how do you find it?
Well, there are probably points in your program where you expect certain
things to be true. For instance, in the sorting program described above,
after you’ve completed the sort, you expect that the array of integers is, in
fact, sorted. If you could check this right after you did the sort, then if
your sorting algorithm ever fails on any input, you will know about it right
away. assert
makes this easy. Using assert
looks
like this:
int i;
/* your algorithm goes here... */
assert(i == 10); /* assert that the integer variable i is equal to 10. */
In the above case, if the program reaches the assert
statement and i
is
equal to 10
, the program continues on executing; the assert
statement has
no effect. However, if i
is not equal to 10
, the program aborts (exits),
telling you the exact line in the file where the error occurred. "Whoa!",
you’re probably saying, "that’s a bit extreme, isn’t it?" Well, the point is
to use assert
for situations where you know something has to be true
assuming that your algorithm is correct. If it isn’t true, your program has
failed, and all bets are off. (In actual fact, it’s fairly easy to write
custom versions of assert
that don’t abort but just print a nasty message and
continue.) The assert
statement is referred to as an assertion, logically
enough.
You might also be asking: why don’t I just write this:
int i;
/* your algorithm goes here... */
if (i != 10)
{
abort(); /* or exit(), or some similar function */
}
instead of using that assert
function? Well, aside from the fact
that assert
is more concise, and that assert
prints out
line numbers where the assertion failed, the big advantage of assertions is
that they can be switched off. Even though assertions don’t usually
slow your code down appreciably, there is some cost, which goes up the more
assertions you use. Let’s say you’ve used assertions to debug your code, and
you reach a point where you are highly confident that your algorithm is
correct. At this point you may want your algorithm to run as fast as
possible, and not waste time checking assertions that you are sure will
always be true. In other words, you want to leave the assertions out in the
compiled code, but you don’t want to remove the lines with the
assert
statements in them in case you modify the code later and need
to debug it again.
It turns out that by adding the magic word -DNDEBUG
as
an argument to gcc
when you compile the C code, the assertions
will be removed from the code before compilation, so your code will run at
maximum speed.
Using assert
in your program
To use assert
, make sure you add this to the top of your file:
#include <assert.h>
In your sort function(s), write this right at the end of the function:
/* Check that the array is sorted correctly. */
for (i = 1; i < num_elements; i++)
{
assert(array[i] >= array[i-1]);
}
where num_elements
is the number of elements in the array and array
is the
name of the array. Notice that here we’re putting the assert
statement in a
loop. This bit of code is technically called a postcondition; a
postcondition is what is required to be true after a function has completed.
[5] Here, the postcondition will cause an assertion failure if the
array isn’t sorted at the end of the sorting function, which is what we want.
Notice how the postcondition is just three lines of trivial code, as compared to the sort routines themselves, which will probably be considerably longer and trickier to write. This illustrates the general principle that it’s much easier to check if something is right than to make it right in the first place. Assertions are easy to write; use them!
NOTE: It may seem wasteful to write the same assertion code at the
end of both sorting functions, but that’s where I want you to write it, and
not e.g. in the main()
function. That’s because you will
eventually be writing larger programs which span many files, and functions
you write will be used outside of the file they are written in. It’s
important for functions to be self-contained as much as possible, and if the
assertion checking is built in to a sorting function, then anyone using that
function will automatically get the benefit of the assertion checking.
Running the test script
Now you get to know why you added the -q
option to your
program. The reason for the -q
option is to make the program
only print output if the assertion fails. We can use this to write a test
script to check the program.
We are supplying you with a Makefile and a test script for your program. The test script is written in the Python language, which you remember from CS 1. Python is available on all Linux systems. What you should do is to download the test script into the same directory that your assignment 3 C code is located in. (Use the "save page as" feature of your browser; don’t try to cut and paste the code into a text editor, because it’s very probable that it won’t work properly if you cut and paste it.) When the file is in the correct location, make it executable by typing:
$ chmod +x run_test
where $
is the terminal prompt.
What the test script does is to generate a series of random inputs to the
program and run the program with the -q
option. The test script is invoked
by typing
$ make test
at the terminal prompt. The test script will tell you what it’s doing as it
proceeds. If your sort functions ever sort incorrectly, you will see the
output of assert
, telling you what line the failure occurred at. The test
script will run the program using the minimum element sort as well as the
bubble sort. If something else goes wrong (e.g. your command-line processing
is faulty, or a core dump occurs), the test script will report an error, along
with the program invocation that caused it. It will not tell you exactly what
error occurred, but by re-running the erroneous invocation (by cutting and
pasting into your terminal window) you should be able to figure it out
yourself. If all goes well, the test script will report success.
Note that if there is a bug in your assertion code itself, all bets are off. However, as mentioned above, writing correct assertions is pretty trivial (especially since we’ve done it for you :-)).
Supporting files
-
The Makefile.
Be sure that all the command lines in the Makefile start with tabs (which they will if you save the link directly), or they will not work.
-
The test script, called
run_test
.Once again, in order to get this to work, you have to do
$ chmod +x run_test
after downloading this file, in order to make it executable. From now on, we’ll assume you’ll remember this (and the
Makefile
tab rule above) for future assignments. -
The style checker. To invoke the style checker you can just type
$ make check
at the prompt (assuming that the Makefile is in the same directory).
To hand in
The sorter.c
file. We will check to see that it compiles, passes the style
checker and passes the test script.
References
-
Darnell and Margolis, chapter 7.
-
K&R, chapters 1 and 5.
-
Any algorithms textbook e.g. Sedgewick, Algorithms in C.