CS 11 C track: Processing command-line arguments

Introduction

The C language has fairly standardized conventions about how to process command-line arguments, which I summarize here. I will also give some advice on the most effective ways to do this.

Conventions for command-line arguments

Here are the conventions:

Optional command-line arguments have a dash (-) before them.
Optional command-line arguments are identified by placing a dash (-) before the optional argument’s name. For instance, the ls command in Unix will give a long form output if the command-line argument -l is provided (the $ is the unix prompt):
$ ls -l total 48 -rw-rw-r-- 1 mvanier cs11 16668 Apr 1 01:23 c_style_guide.html -rwxr-xr-x 1 mvanier cs11 2296 Apr 1 01:21 c_style_check -rw-r--r-- 1 mvanier cs11 755 Apr 1 15:46 cmdline_args.html -rw-r--r-- 1 mvanier cs11 8077 Feb 11 22:23 gdb.html -rw-rw-r-- 1 mvanier cs11 8290 Sep 25 2001 make.html
In general, if an argument doesn’t have a dash in front of it, it’s not optional unless it’s an argument to another command-line argument (see below). Note that programs in DOS or Windows typically use a forward slash (/) for command-line options. For this course, we will use the dash only (which is the Unix convention).
Optional command-line arguments may be located anywhere in the argument list and in any order.

Don’t assume that your user will always put optional arguments before non-optional arguments, or will put optional arguments in a particular order. Doing this invariably leads to very convoluted code which is hard to read and often doesn’t work much of the time. I’ll show you how to do it the right way later in this page.
Optional command-line arguments may themselves have arguments.
Sometimes an optional argument may have arguments of its own. These arguments don’t usually have dashes in front of them, and are often numbers. It’s as if you’re saying "you may not need to do this optional task at all, but if you do, you’ll need to know these other argument values as well". For instance, a sort program may have an optional argument which tells which kind of sort routine to use:
$ sort -method bubble words.txt
Here, the argument bubble is an argument to the -method option and specifies a bubble sort (a particular kind of sorting algorithm). If it wasn’t included the default might be to use quicksort (another sorting algorithm):
$ sort words.txt
Note that here you omit both the -method optional argument and its argument bubble. Fortunately for you, none of the programs in this track have optional arguments that themselves have arguments, but you will certainly see this and/or have to implement this eventually.

Exceptions to the conventions

Some programs don’t use the dash in front of optional arguments (the tar program is an example; it’s often invoked as tar xvf <filename> where xvf are optional arguments). This is not recommended. Some programs allow several single-letter options to be preceded by a single dash e.g. ps -elf instead of ps -e -l -f. This is also not recommended, at least not for the programs in this track.

How to process command-line arguments

Command-line arguments are always represented as an array of strings. This array is called argv (for "argument values") and there is also an integer called argc (for "argument count") which is the number of command-line arguments. That’s why the main function looks like this:

int main(int argc, char *argv[])

Here, argc is declared to be an int, whereas argv is an array of char *'s (i.e. strings). Remember that argv[0] is the program’s name, so you normally won’t want to use that except in a usage statement (see below).

Your first task in main is to process the optional argument values, if any. The standard way to walk through the argv array is like this:

int i;
int quiet = 0;  /* Value for the "-q" optional argument. */

for (i = 1; i < argc; i++)  /* Skip argv[0] (program name). */
{
    /*
     * Use the 'strcmp' function to compare the argv values
     * to a string of your choice (here, it's the optional
     * argument "-q").  When strcmp returns 0, it means that the
     * two strings are identical.
     */

    if (strcmp(argv[i], "-q") == 0)  /* Process optional arguments. */
    {
        quiet = 1;  /* This is used as a boolean value. */
    }
    else
    {
        /* Process non-optional arguments here. */
    }
}

Note that the "-q" optional argument could be located anywhere on the command line and the program would still work. If the optional argument has arguments of its own the code is trickier:

int i;
int opt = 0;
int optarg1 = 0;
int optarg2 = 0;

for (i = 1; i < argc; i++)  /* Skip argv[0] (program name). */
{
    if (strcmp(argv[i], "-opt") == 0)  /* Process optional arguments. */
    {
        opt = 1;  /* This is used as a boolean value. */

        /*
         * The last argument is argv[argc-1].  Make sure there are
         * enough arguments.
         */

        if (i + 2 <= argc - 1)  /* There are enough arguments in argv. */
        {
            /*
             * Increment 'i' twice so that you don't check these
             * arguments the next time through the loop.
             */

            i++;
            optarg1 = atoi(argv[i]);  /* Convert string to int. */
            i++;
            optarg2 = atoi(argv[i]);  /* Ditto. */
        }
        else
        {
            /* Print usage statement and exit (see below). */
        }
    }
    else
    {
        /* Process non-optional arguments here. */
    }
}

In some cases, command-line processing can get quite hairy. Fortunately for you, the above examples are more than sufficient for the kinds of programs we do in this course.

Usage statements

Your program has to be able to handle the case when invalid command-line arguments are provided to it without crashing (core dumping etc.). The correct way to handle this is:

Display a usage statement.
Exit the program.

For instance, let’s say that your program expects exactly three arguments in addition to the program name, and can take another optional argument. You could write this:

if (argc < 4)
{
    fprintf(stderr, "usage: %s filename word count [-w]\n", argv[0]);
    exit(1);
}

There are several parts to this:

The usage message: it always starts with the word usage, followed by the program name and the names of the arguments. Argument names should be descriptive if possible, telling what the arguments refer to, like filename above. Argument names should not contain spaces! Optional arguments are put between square brackets, like -w above. Do not use square brackets for non-optional arguments! Always print to stderr, not to stdout, to indicate that the program has been invoked incorrectly.
The program name: always use argv[0] to refer to the program name rather than writing it out explicitly. This means that if you rename the program (which is common) you won’t have to re-write the code.
Exiting the program: use the exit function, which is defined in the header file <stdlib.h>. Any non-zero argument to exit (e.g. exit(1)) signals an unsuccessful completion of the program (a zero argument to exit (exit(0)) indicates successful completion of the program, but you rarely need to use exit for this). If you’re truly anal you can use EXIT_FAILURE and EXIT_SUCCESS (which are defined in <stdlib.h>) instead of 1 and 0 as arguments to exit.

If you have to write out a usage statement more than once, make it a separate function called (obviously) usage and pass it the program name (argv[0]) as an argument. Then call it from main whenever the program has invalid arguments.

Dos and don’ts

Always print a usage message to stderr if the program receives incorrect arguments.
Don’t assume that optional arguments will be located in any particular place in the argument list. (This was discussed above.)
Don’t try to process all the command-line arguments in a single pass if it isn’t convenient to do so.

I’ve seen a lot of C code that tied itself in knots trying to process the entire argument list in one pass. Typically, the code has a dense nest of if statements to handle every possible combination of arguments in every possible order. This is completely unnecessary and is simply bad programming. Most program invocations have very few command-line arguments, so even if you just process one of them per pass through the argument list you still won’t be wasting much time.

Having said that, the command-line argument processing for lab 3 can be done in one pass through the argv array with no difficulty.
Don’t alter the argv array!

Some programmers do strange manipulations to the argv array involving pointer arithmetic, moving arguments around, trying to delete arguments, etc. The usual reason for this is to get rid of the arguments that have already been processed, particularly optional arguments. It’s really easy to screw up when doing this, and it’s never necessary, so don’t do it! It’s OK to copy some of the arguments to a separate array and/or separate variables if you need to.