Goals
In this assignment you will write a sorting program similar to the program
from assignment 3, except that it uses linked lists instead of arrays to hold the
list of numbers read in off the command line. In the process, you will learn
about one of the fundamental constructs in the C language (struct
)
and more about pointers.
Language concepts covered this week
-
struct
Structs provide a way to bundle together several related pieces of data into a single piece of data. Structs are commonly used to define new data types.
-
typedef
Typedefs are a way to give an alias (alternative name) for a type name. Used correctly, they can make code both more concise and more readable.
-
linked lists
Linked lists are a fundamental data type that is constructed using structs and pointers. Linked lists are very common in real code, so it is important that you become comfortable with using them.
Other concepts covered this week
This assignment covers sorting in a so-called "functional" style, whereby the original data collection to be sorted is not altered by the sorting process. Rather, the sorting process generates a new, sorted data collection. If you know the OCaml or Haskell programming languages (e.g. from CS 4 and CS 115) this style will be familiar to you; if not, it’s well worth learning about.
Structs
A struct is a data object which holds other data objects of arbitrary pre-specified types. These data objects are called the fields of the struct, and they are generally conceptually related to each other. For instance, if you wanted a data object to hold the x-y coordinates of a point on a graphical display screen, you could declare a struct to do this:
struct point
{
int x;
int y;
}
This declaration tells the compiler what a point
is, but
doesn’t create any points. To do that you do this:
struct point p; /* Creates a new point called 'p'. */
However, most C programmers use the following shortcut. In C you can give a type name an alias (alternative name) by using a typedef:
typedef struct point
{
int x;
int y;
} Point;
This defines struct point
as above, and also declares
Point
to be an alias for struct point
. Now you can
create a new Point
like this:
Point p;
Note that Point
and point
are not the same
thing, because you don’t use the struct point
syntax
when you’re declaring a Point
. In fact, writing struct
Point
is invalid; you have to just write Point
. However,
a Point
object is still a struct
.
You can also leave off the point
identifier entirely and write this:
typedef struct
{
int x;
int y;
} Point;
If you do this, you can use Point
s but not struct point
s.
How do you use a struct? You can access the field values in two different
ways. The first way is to use the .
operator:
int z;
Point p;
p.x = 10; /* Set 'x' field of p to 10 */
p.y = 20; /* Set 'y' field of p to 20 */
/* You can use 'p.x' syntax on either side of an assignment statement. */
z = p.x + p.y; /* = 30 */
The second way is used with pointers to structs. Then you would normally
use the arrow (->
) operator:
int z;
Point p;
Point *q; /* pointer to a Point struct */
q = &p; /* Now q points to p. */
q->x = 10; /* Set 'x' field to 10 */
q->y = 20; /* Set 'y' field to 20 */
/* You can use 'q->x' syntax on either side of an assignment statement. */
z = q->x + q->y; /* = 30 */
Note that q->x
is nothing more than syntactic sugar for (*q).x
.
Recursively-defined structs and singly-linked lists
This assignment will feature a new data type called a singly-linked list.
This is a very flexible data type; whole computer languages (e.g. Lisp
and Scheme) have been built around it. We will restrict ourselves to a
singly-linked list of integers. In C we only need one struct for this, which
we’ll call a node
:
typedef
struct _node
{
int data; /* Each node stores a number */
struct _node *next; /* and a pointer to the next node in the list. */
} node;
Actually, we are declaring a struct called _node
and
typedef
ing it to the name node
.
The interesting thing about this struct is that it is recursively defined.
A node
has a data
field which holds an integer and
a pointer to another node
in its next
field.
The way we use this is as follows. We allocate memory for a bunch of
nodes, then we link the next
field of one node to another node,
link the next
field of that node to still another node, and so
on until all of the nodes are linked together. We set the next
field of the last node to be equal to NULL
, which lets us know
when we’ve hit the end. NULL
is a value #define
d
in <stdio.h>
which cannot be the value of a valid pointer
(normally it is defined to be zero).
The code to create the list might look something like this:
int i;
int n_nodes = 10; /* We want ten nodes. */
node *list; /* pointer to the list */
node *temp, *prev; /* temporary pointers to nodes */
/* Make the node at the front of the list first. */
list = (node *)malloc(sizeof(node));
/* Check for malloc failure. */
if (list == NULL) /* malloc failed */
{
fprintf(stderr, "Error: malloc() failed.\n");
exit(1); /* Give up, abort program. */
}
list->data = 1;
list->next = NULL; /* No other nodes yet. */
/* Make the rest of the nodes and link them up. */
prev = list;
for (i = 0; i < (n_nodes - 1); i++)
{
/* Create a new node. */
temp = (node *)malloc(sizeof(node));
/* Check for malloc failure. */
if (temp == NULL) /* malloc failed */
{
fprintf(stderr, "Error: malloc() failed.\n");
exit(1); /* Give up, abort program. */
}
/* Initialize the new node. */
temp->data = prev->data + 1;
temp->next = NULL;
/* Link previous node to the new node. */
prev->next = temp;
/* Set the 'prev' pointer to point to the new node. */
prev = temp;
}
/*
* Now, 'list' points to a list of ten nodes,
* with values 1, 2, 3, ... 10 from front to back.
*/
Another way to do this is to create the list starting from the end node and grow it backwards until you hit the first node. This is actually a lot easier:
int i, data;
int n_nodes = 10; /* We want ten nodes. */
node *list; /* pointer to the list */
node *temp; /* temporary pointer to node */
list = NULL; /* NULL represents the empty list. */
data = 10; /* Starting data value. */
/* Make the nodes and link them up. */
for (i = 0; i < n_nodes; i++)
{
/* Create a new node. */
temp = (node *)malloc(sizeof(node));
/* Check for malloc failure. */
if (temp == NULL) /* malloc failed */
{
fprintf(stderr, "Error: malloc() failed.\n");
exit(1); /* Give up, abort program. */
}
/* Initialize the new node. */
temp->data = data;
temp->next = list;
/* Set the data value for the next node. */
data--;
/* Set the 'list' pointer to point to the new node. */
list = temp;
}
/*
* Now, 'list' points to a list of ten nodes,
* with values 1, 2, 3, ... 10 from front to back.
*/
You should work through both of these examples until you understand why they work. It’s easy to write linked list code which is much more complicated than it needs to be if you’re not clear on what’s really happening when you create nodes and link them up.
In fact, we’ve already written all the linked list functions that you’ll need for this assignment. They are described in the header file linked_list.h and defined in the file linked_list.c. All you have to do is use them. Make sure you do use them; we’ll take points off for not using them and instead writing the equivalent code out by hand. That’s a practice called "reinventing the wheel" which is a bad habit; once someone else has gone to the trouble of writing working, fully-debugged code, it doesn’t make sense to rewrite it yourself if the existing code works fine for your purposes.
Program to write
The program to write for this assignment is a variation on assignment 3. Recall that in that program, you inputted a list of numbers and the program printed them out in sorted order. This time you will be doing the same thing, with these differences:
-
You will use a linked list to store the numbers entered on the command line. This will have the advantage that you can enter an arbitrary number of numbers (i.e. you’re not limited to just 32 numbers).
-
You will use the quicksort algorithm to sort the numbers instead of the minimal insertion sort or bubble sort. Quicksort is much more efficient than those algorithms, and is described below.
Your program will be called quicksorter
and the source code will be in a file
called quicksorter.c
. Here is the specification:
-
The program will accept an arbitrary number of numbers on the command line, as long as there is at least one number.
-
If there are no command-line arguments, or if there are arguments but none that represent numbers to be sorted, the program should print out instructions on its use (i.e. a usage message). By the way, don’t assume that the test script checks this; currently it doesn’t, so check it yourself (we will).
-
If any of the arguments to the program are not integers, your program’s response is undefined — it can do anything (in other words, you shouldn’t worry about handling non-integer arguments). The only exception to this is the optional
-q
argument to suppress output (see below). -
The program will sort the numbers using the quicksort algorithm. The numbers on the command line will be stored in a linked list. You should have a separate
quicksort
function to do the sorting. This function should receive exactly one argument (the list to be sorted) and return a pointer to a newly-allocated list which contains the same integers as the input list, but in sorted (ascending) order. Do not alter the original list in any way. Also, do not pass the length of the list to thequicksort
function (it isn’t necessary anyway). -
As in assignment 3, use
assert
to check that the list is sorted after doing the sort. Use theis_sorted()
function inlinked_list.c
to check the list. Put theassert
check at the end of thequicksort
function, right before it returns the sorted list. -
Print out the numbers from smallest to largest, one per line.
-
Use the memory leak checker as in the previous assignment to check that there are no memory leaks.
Example output
$ ./quicksorter 5 9 -2 150 -95 23 2 5 80 -95 -2 2 5 5 9 23 80 150 $ ./quicksorter usage: ./quicksorter [-q] number1 [number2 ... ]
Command-line options
Add a -q
command-line argument that suppresses output like
you did in assignment 3. Make sure that even when this is in effect, you
still sort the numbers. Also, make sure in all cases that you use the
is_sorted()
function in linked_list.c
(see below)
to test whether the sorted list is in fact sorted. Your program should
accept single or multiple -q
arguments on the command line
(just like assignment 3), where multiple -q
arguments are the same as
a single one, and if there are no numbers on the command line
(e.g. just -q
arguments or no arguments at all)
the program should exit with a usage message.
The quicksort algorithm
Quicksort
is a sorting algorithm invented by the computer scientist
C. A. R. (Tony) Hoare.
It is much more efficient than bubble sort or insertion sort
(technically, it has an \$O(n\ log\ n)\$ average time complexity
as compared to \$O(n^2)\$ for the other two algorithms).
Quicksort is usually used to sort an array in place
(there is even a C library function called qsort
which does this),
but in this case we are sorting a linked list
and we are not modifying the original list,
which changes the details of the algorithm somewhat.
Quicksort is a recursive algorithm,
like the algorithm you used to solve the triangle game in assignment 4.
Here is the algorithm:
-
If the list has zero nodes (i.e. it’s empty) or one node, copy the list as-is and return it (it’s already sorted). Note that this algorithm works fine on empty lists.
-
Otherwise, create a copy of the first node of the list and "put it aside" for now.
-
Create two new lists from the rest of the original list:
-
a list containing all values greater than or equal to the value of the first node;
-
a list containing all values smaller than the value of the first node.
Remember, you have to make these lists yourself and they have to copy the values in the original nodes.
-
-
Sort these two new lists using a recursive call to the
quicksort
function. -
Append everything together in order:
-
The list of nodes with values smaller than the first node,
-
the copy of the first node,
-
and the list of nodes with values greater than or equal to the first node.
Use the
append_lists
function defined inlinked_list.c
to do the appending. -
Memory management
Remember, when you use malloc
or calloc
to allocate memory, that memory
must be explicitly freed (using the free
function) before the end of the
program or you have a memory leak. It can be tricky avoiding memory leaks,
but here is a specific suggestion: don’t ever do this:
node *list;
/* ... some code which assigns values to 'list' ... */
/* Sort the list and set the list pointer to point to the sorted list. */
list = sort_list(list);
In this case, sort_list()
returns a new (sorted) list, and
the list pointer list
is set to the head of that list. This is
a memory leak, because the old list that list
pointed to was
never freed. The right way to do this is as follows:
node *list;
node *sorted_list;
/* ... some code which assigns values to 'list' ... */
/* Sort the list and set the list pointer to point to the sorted list. */
sorted_list = sort_list(list);
free_list(list);
list = sorted_list;
The result is the same, but there’s no memory leak.
Use the same memory leak checker you used last assignment to make sure that
you haven’t inadvertently leaked any memory. Recall that you do this by
putting these lines at the top of the file quicksorter.c
:
#include <stdlib.h>
#include "memcheck.h"
and then putting this line in the main
function before
exiting:
print_memory_leaks();
In addition, if you can’t track down a memory leak you might try changing
the following line in memcheck.c
from
#define DEBUG 0
to
#define DEBUG 1
and recompiling. This will give more verbose output relating to memory allocation.
Code supplied
There are a number of useful utility functions supplied in
linked_list.c and declared in
linked_list.h. Please use these instead of
writing your own versions; it will save you a lot of time. You will need to
write #include "linked_list.h"
in your code in order for this to work.
Testing your program
We’re supplying you with a test script similar to the test script for assignment 3
(though not as rigorous).
Also test your program on some numbers you generate by hand.
If you need to use a debugger, read the gdb page
to learn how to use gdb
(the debugger for gcc
).
Supporting files
-
The Makefile.
Once again, be sure that all the command lines in the Makefile start with tabs, or they will not work.
-
The test script.
Before running it, do
$ chmod +x run_test
to make the test script executable. Make sure your program passes the test script before you hand it in.
-
The memory leak checker: memcheck.c and memcheck.h.
To hand in
The quicksorter.c
program.
References
-
Darnell and Margolis, chapter 9.
-
K&R, chapters 1 and 5.
-
Any algorithms textbook e.g. Sedgewick, Algorithms in C.