The Compilation Process

From Text to Binary Executable

This page outlines the basic compilation process and various tools that you will be using. We assume that you are using a Unix-derivative (Linux or other) or a Unix-like environment such as Cygwin. The basics are the same in all of these environments.

When you run the gcc compiler a lot more is going on then you might think. This figure shows the major components in play.

 

Fig. 1. The main elements of the compilation process and the gcc tool chain.

 

The process starts with the use of a text editor or integrated development environment (IDE) to construct source code files. These are given the extensions .c and .h. These files are text and readable by any text editor. In this course we will not be using an IDE.

The steps are:

  1. Create the program files, one of which should include the main() function.
  2. From the command console (with the Unix shell prompt) run the gcc program giving the name of the source files (.c only) and the desired target file name. Like this:
    %gcc file1.c file2.c -o myProg (to be explained in class).
  3. The gcc program now runs three (or more) sub-programs. The first is the “precompiler”. It takes the source code files and prepares an in-memory image that will be used by the compiler. The compiler takes the prepared text image for each module file and generates an output file called an “object file” for each input module file. These are then read in by a program called a linker that combines the object files so as to create a single executable file having the name designated in the gcc command above. If no destination name is indicated (no -o filename given) then the executable file will be given the name a.out automatically in Linux (a.exe in cygwin).

Later in the course we will see some variations on this basic scheme. GCC can be given different instructions at the command line to cause it to behave differently as needed. We will also add another tool called a debugger (gdb) that works with the compiler to produce a version of an executable that allows you to see what is going on inside the program when it runs.

Contents and Relations of Files in a C Program

For right now there are just two kinds of files that you will create for your C programs. As you get started you will probably only create one of each for simple programs, but as the programs get more complex you will need to break it up into discrete modules based on a logical grouping of functions.

For every C program there is a single .c source code file given the name of the program, e.g. “hello” is the final program, so the main file is called hello.c. This file is the only one to contain the main() function. This function is the first function to be called in your program and it will be used to control the overall program operation. There will also be a .h file, except for the most simple programs (like hello). This file contains the information that is needed in the .c file as described below.

In Figure 2 we see a program that has been divided up into a number of files: program.h, where the word program represents any program name, program.c, module1.h, module1.c, module2.h, module2.c, module3.h, and module3.c

 

Fig. 2. Showing the relations of text files in a typical C program.

 

program.c contains the main() function that drives the entire program. As shown above, one of the sub-functions that is called in main is func1(). However, that function is defined (coded) in module1.c and declared (prototype) in module1.h. The prototype for func1() is given in its header file called module1.h. This header file contains all of the function prototypes for all of the functions defined in module1.c. Therefore program.c contains a directive to #include “module1.h” so that the prototype for that function is known to the program.c when it is called in main(). This is necessary since the C compiler needs to know that there is such a function defined somewhere and when it encounters that line in the C code, it assumes that the function will be linked in at link time (in Figure 1).

The other two module files (along with their header files) are similar in containing functions that can be called by anywhere in the program (e.g. func1() might call a function in module3.c. Typically, these modules are divided up based on a scheme of logical similarity between functions in one module. For example, module1.c might contain a number of functions that are called only from program.c, main(). Say that main() contains an endless loop construct that presents a menu of options to a user to choose. Once the user makes a choice, the code in main() calls a specific function in module1.c. That function, in turn can call any number of lower level functions in module2.c. Module module3.c might contain a number of commonly used functions that are used by any of the other modules or the main program.

For example, lets say that the main() function prints the following on a screen.

                          Main Menu

                    1.         Load a File
                    2.         Edit
                    3.         Save
                    4.         Delete a File
                    5.         Exit

Choose a number option: ___
This menu of options is displayed in an endless loop. If the user chooses #5 then the program will terminate normally by breaking out of the loop. If the user selects any other option, say #1, the code calls a function in module1.c called loadFile(). However, in order to print the menu on the screen in the first place, suppose there is a function in module3.c called formatALine(), that takes some arguments that allow it to produce a formatted line to be printed by main() as it prints the menu. That function is what is called a helper function that can be called from many different places in the overall program to do a specific job that is needed by many other functions.
//in program.c
int main(void) {
    // initialize variables
    // enter endless loop
    for(;;) {
        // print menu
        // get selection
        switch selection {
            case 1: errorNo = loadAFile();
            case 2: errorNo = editAFile();
            // etc.
        }
    }
}

// in module1.c

int loadAFile(void) {
    char filename[91];
    int errorCode = NO_ERROR;    // #defined as 0 in program.h

    printf("Enter a file name: ");
    scanf("%s", filename); // remember filename is actually the address of the array!

    errorCode = checkFileName(filename);
    // do other stuff, like writing the filename to a shared location in memory
    return errorCode;
}

// in module3.c

int checkFileName (char * filename) {  // interesting case! the identifier "filename" here is
                                       // actually different from the same name in the calling
                                       // function above!
    int errorCode = NO_ERROR;          // this errorCode is LOCAL to this function!
    // do stuff to check that the filename is in proper format, exists on the disk, etc.
    return errorCode;
}

In the above code “snippets” we see that the main() contains a call to another function called loadAFile() which will be activated if the user selects #1. This function is actually defined in module1.c. In turn, that function calls a function, checkFileName() that is defined in module3.c. This latter function might be called from other functions, such as deleteAFile(). It is a helper that helps several other functions, but only needs to be written once.

The standard library functions (e.g. standard I/O functions declared in stdio.h, such as printf() and scanf()) are already compiled and are grouped together in special object files (with .l or .lib extensions). The .h files provide what we call the interface information needed by whatever program uses them. The functions are common across all C programs and so, like checkFileName() above, have been put into files that can be accessed by all other C programs. You will use a mixture of standard library functions (esp. standard I/O) and helper functions that you write specifically for an application.