Large Programming Projects

Large Programming Projects

In order to build the operating system, each .c is compiled into an object file by the C compiler. Object files, which have the suffix .o, contain binary instructions for the target machine. They will later be directly carried out by the CPU. There is nothing like Java byte code in the C world.

The first pass of the C compiler is called the C preprocessor. As it reads each .c file, every time it hits a #include directive, it goes and gets the  header file named in it and processes it, expanding macros, handling conditional compilation (and certain other things) and passing the results to  the next pass of the compiler as if they were physically included.

Since operating systems are very large (five million lines of code is not unusual), having to recompile the entire thing every time one file is changed would be unbearable. On the other hand, changing a key header file that is contained in thousands of other files does need recompiling those files. Keeping track of which object files depend on which header files is totally unmanageable without help.

Luckily, computers are very good at precisely this sort of thing. On UNIX systems, there is a program called make (with various variants such as gmake, pmake, etc.) that reads the Makefile, which tells it which files are dependent on which other files. What make does is see which object files are required to build the operating system binary needed right now and for each one, check to see if any of the files it depends on (the code and headers) have been modified subsequent to the last time the object file was created. If so, that object file has to be recompiled. When make has determined which .c files have to recompiled, it invokes the C compiler to recompile them, thus reducing the number of compilations to the bare minimum. In large projects, creating the Makefile is error prone, so there are tools that do it automatically.

Once all the .o files are ready, they are passed to a program called the linker to combine all of them into a single executable binary file. Any library functions called are also included at this point, interfunction references are resolved, and machine address are relocated as need be. When the linker is finished, the result is an executable program, usually called a.out on UNIX systems. The various components of this process are illustrated in the following figure for a program with three C files and two header files. Although we have been discussing operating system development here, all of this applies to developing any large program.

The process of compiling C and header files to make an executable.


operating system, macros, header files