It is important to understand that while some computer languages (e.g. Python, Scheme or Basic) are normally used with an interactive interpreter (where you type in commands that are immediately executed), C doesn’t work that way. C source code files are always compiled into binary code by a program called a "compiler" and then executed. This is actually a multi-step process which we describe in some detail here.
The different kinds of files
Compiling C programs requires you to work with four kinds of files:
-
Regular source code files. These files contain function definitions, and have names which end in
.c
by convention. -
Header files. These files contain function declarations (also known as function prototypes) and various preprocessor statements (see below). They are used to allow source code files to access externally-defined functions. Header files end in
.h
by convention. -
Object files. These files are produced as the output of the compiler. They consist of function definitions in binary form, but they are not executable by themselves. Object files end in
.o
by convention, although on some operating systems (e.g. Windows, MS-DOS), they often end in.obj
. -
Binary executables. These are produced as the output of a program called a "linker". The linker links together a number of object files to produce a binary file which can be directly executed. Binary executables have no special suffix on Unix operating systems, although they generally end in
.exe
on Windows.
There are other kinds of files as well, notably libraries (.a
files),
assembly language source code (.s
files) and shared libraries (.so
files),
but you won’t normally need to deal with them directly.
The preprocessor
Before the C compiler starts compiling a source code file, the file is
processed by a preprocessor. This is in reality a separate program (normally
called cpp
, for "C preprocessor"), but it is invoked automatically by the
compiler before compilation proper begins. What the preprocessor does is
convert the source code file you write into another source code file (you can
think of it as a "modified" or "expanded" source code file). That modified
file may exist as a real file in the file system, or it may only be stored in
memory for a short time before being sent to the compiler. Either way, you
don’t have to worry about it, but you do have to know what the preprocessor
commands do.
Preprocessor commands start with the pound sign (#
). There are several
preprocessor commands; two of the most important are:
-
#define
This is mainly used to define constants. For instance,
#define BIGNUM 1000000
specifies that wherever the character string
BIGNUM
is found in the rest of the program,1000000
should be substituted for it. For instance, the statement:int a = BIGNUM;
becomes
int a = 1000000;
#define
is used in this way so as to avoid having to explicitly write out some constant value in many different places in a source code file. This is important in case you need to change the constant value later on; it’s much less bug-prone to change it once, in the#define
, than to have to change it in multiple places scattered all over the code. -
#include
This is used to access function definitions defined outside of a source code file. For instance:
#include <stdio.h>
causes the preprocessor to paste the contents of
<stdio.h>
into the source code file at the location of the#include
statement before it gets compiled.#include
is almost always used to include header files, which are files which mainly contain function declarations and#define
statements. In this case, we use#include
in order to be able to use functions such asprintf
andscanf
, whose declarations are located in the filestdio.h
. C compilers do not allow you to use a function unless it has previously been declared or defined in that file;#include
statements are thus the way to re-use previously-written code in your C programs.
There are a number of other preprocessor commands as well, but we will deal with them as we need them.
Making the object file: the compiler
After the C preprocessor has included all the header files and expanded out all
the #define
and #include
statements (as well as any other preprocessor
commands that may be in the original file), the compiler can compile the
program. It does this by turning the C source code into an object code file,
which is a file ending in .o
which contains the binary version of the source
code. Object code is not directly executable, though. In order to make an
executable, you also have to add code for all of the library functions that
were #include
d into the file (this is not the same as including the
declarations, which is what #include
does). This is the job of the linker
(see the next section).
In general, the compiler is invoked as follows:
$ gcc -c foo.c
where $
is the unix prompt. This tells the compiler to run the preprocessor
on the file foo.c
and then compile it into the object code file foo.o
. The
-c
option means to compile the source code file into an object file but not
to invoke the linker. If your entire program is in one source code file, you
can instead do this:
$ gcc foo.c -o foo
This tells the compiler to run the preprocessor on foo.c
, compile it and then
link it to create an executable called foo
. The -o
option states that the
next word on the line is the name of the binary executable file (program). If
you don’t specify the -o
, i.e. if you just type gcc foo.c
, the executable
will be named a.out
for silly historical reasons.
Note also that the name of the compiler we are using is gcc
, which stands for
"GNU C compiler" or "GNU compiler collection" depending on who you listen to.
Other C compilers exist; many of them have the name cc
, for "C compiler". On
Linux systems cc
is an alias for gcc
.
Putting it all together: the linker
The job of the linker is to link together a bunch of object files (.o
files)
into a binary executable. This includes both the object files that the
compiler created from your source code files as well as object files that have
been pre-compiled for you and collected into library files. These files have
names which end in .a
or .so
, and you normally don’t need to know about
them, as the linker knows where most of them are located and will link them in
automatically as needed.
Like the preprocessor, the linker is a separate program called ld
. Also like
the preprocessor, the linker is invoked automatically for you when you use the
compiler. The normal way of using the linker is as follows:
$ gcc foo.o bar.o baz.o -o myprog
This line tells the compiler to link together three object files (foo.o
,
bar.o
, and baz.o
) into a binary executable file named myprog
. Now you
have a file called myprog
that you can run and which will hopefully do
something cool and/or useful.
This is all you need to know to begin compiling your own C programs.
Generally, we also recommend that you use the -Wall
command-line
option:
$ gcc -Wall -c foo.c
The -Wall
option causes the compiler to warn you about legal but dubious code
constructs, and will help you catch a lot of bugs very early. If you want to
be even more anal (and who doesn’t?), do this:
$ gcc -Wall -Wstrict-prototypes -ansi -pedantic -c foo.c
The -Wstrict-prototypes
option means that the compiler will warn you if you
haven’t written correct prototypes for all your functions. The -ansi
and
-pedantic
options cause the compiler to warn about any non-portable construct
(e.g. constructs that may be legal in gcc
but not in all standard C
compilers; such features should usually be avoided).
References
-
Kernighan and Ritchie, The C Programming Language, 2nd Ed.
-
The
gcc
online documentation. (This is pretty overwhelming.) -
The man page for
gcc
. Typeman gcc
at a terminal prompt. [1]