Microkernels

Microkernels

By means of the layered technique, the designers have an option where to draw the kernel-user boundary. Usually, all the layers went in the kernel, but that is not necessary. In reality, a strong case can be made for putting as little as possible in kernel mode because bugs in the kernel can bring down the system instantly. On the contrary, user processes can be set up to have less power so that a bug there may not be lethal.

Many researchers have studied the number of bugs per 1000 lines of code (e.g., Basilli and Perricone, 1984; and Ostrand and Weyuker, 2002). Bug density depends on module size, module age, and more, but a ballpark figure for serious industrial systems is ten bugs per thousand lines of code. This means that a monolithic operating system of five million lines of code is likely to contain something like 50,000 kernel bugs. Not all of these are fatal, certainly, since some bugs may be things like issuing an incorrect error message in a situation that seldom occurs. However, operating systems are adequately buggy that computer manufacturers put reset buttons on them (often on the front panel), something the manufacturers of TV sets, stereos, and cars do not do, in spite of the large amount of software in these devices.

The main idea behind the microkernel design is to attain high reliability by splitting the operating system up into small, well-defined modules, only one of which-the microkernel-runs in kernel mode and the rest run as relatively powerless normal user processes. Particularly, by running each device driver and file system as a separate user process, a bug in one of these can crash that component, but cannot crash the whole system. Thus a bug in the audio driver will cause the sound to be garbled or stop, but will not crash the computer. On the contrary, in a monolithic system with all the drivers in the kernel, a buggy audio driver can easily reference an invalid memory address and bring the system to a grinding halt immediately.

Many microkernels have been implemented and deployed (Accetta et al., 1986; Haertig et al., 1997; Heiser et al., 2006; Herder et al., 2006;  Hildebrand, 1992; Kirsch et al., 2005; Liedtke, 1993, 1995, 1996; Pike et al., 1992; and Zuberi et al., 1999). They are especially common in real- time, industrial, avionics, and military applications that are mission critical and have very high reliability requirements. A few of the better-known microkernels are Integrity, K42, L4, PikeOS, QNX, Symbian, and MINIX 3 . We will now give a brief overview of MINIX 3, which has taken the idea of modularity to the limit, breaking most of the operating system up into a number of independent user-mode processes. MINIX 3 is a POSIX conformant, open-source system freely available at www.minix3.org (Herder et al., 2006a; Herder et al., 2006b ).

The MINIX 3 microkernel is only about 3200 lines of C and 800 lines of assembler for very low-level functions such as catching interrupts and switching processes. The C code manages and schedules processes, handles interprocess communication (by passing messages between  processes), and offers a set of about 35 kernel calls to allow the rest of the operating system to do its work. These calls carry out functions like hooking handlers to interrupts, moving data between address spaces, and installing new memory maps for newly created processes. The process structure of MINIX 3 is shown in the following figure, with the kernel call handlers labeled Sys. The device driver for the clock is also in the kernel because the scheduler interacts closely with it. All the other device drivers run as separate user processes.

Structure of the MINIX 3 system

Outside the kernel, the system is structured as three layers of processes all running in user mode. The lowest layer includes the device drivers. Since they run in user mode, they do not have physical access to the I/O port space and cannot issue I/O commands directly. Instead, to program an I/O device, the driver makes a structure telling which values to write to which I/O ports and makes a kernel call telling the kernel to do the write. This approach means that the kernel can check to see that the driver is writing (or reading) from I/O it is authorized to use. As a result, (and unlike a monolithic design), a buggy audio driver cannot by mistake write on the disk.

Above the drivers is another user-mode layer containing the servers, which do most of the work of the operating system. One or more file servers manage the file system(s), the process manager creates, destroys, and manages processes, and so on. User programs get operating system services by sending short messages to the servers asking for the POSIX system calls. For instance, a process needing to do a read sends a message to one of the file servers telling it what to read.

One interesting server is the reincarnation server, whose job is to check if the other servers and drivers are functioning correctly. In the event that a faulty one is detected, it is automatically replaced without any user intervention. Thus the system is self healing and can attain high reliability.

The system has many restrictions limiting the power of each process. As mentioned, drivers can only touch authorized I/O ports, but access to kernel calls is also controlled on a per process basis, as is the ability to send messages to other processes. Processes can also grant limited permission for other processes to have the kernel access their address spaces. As an example, a file system can grant permission for the disk driver to let the kernel put a newly read in disk block at a specific address within the file system's address space. The sum total of all these restrictions is that each driver and server has exactly the power to do its work and nothing more, thus greatly limiting the damage a buggy component can do.

An idea somewhat related to having a minimal kernel is to put the mechanism for doing something in the kernel but not the policy. To make this point better, think about the scheduling of processes. A relatively simple scheduling algorithm is to assign a priority to every process and then have the kernel run the highest-priority process that is runnable. The mechanism - in the kernel - is to look for the highest-priority process and run it. The policy - assigning priorities to processes - can be done by user-mode processes. Thus policy and mechanism can be decoupled and the kernel can be made smaller.



Tags

i/o devices, device driver, system calls