In usual operating systems, each process has an address space and a single thread of control. In reality, that is almost the definition of a process. However, there are often situations in which it is desirable to have multiple threads of control in the same address space  running in quasi-parallel, as though they were (almost) separate processes (except for the shared address space). In the following sections we will talk about these situations and their implications.

Thread Usage

Why would anyone want to have a kind of process within a process? It turns out there are various reasons for having these miniprocesses, called threads. Let us now look at some of them. The major reason for having threads is that in many applications, multiple activities are going on at once. Some of these may block from time to time. By decomposing such an application into multiple sequential threads that run in quasi-parallel, the programming model becomes simpler.

We have seen this argument before. It is specifically the argument for having processes. Instead of thinking about interrupts, timers, and context switches, we can think about parallel processes. Only now with threads we add a new element: the ability for the parallel entities to share an address space and all of its data among themselves. This ability is necessary for certain
applications, which is why having multiple processes (with their separate address spaces) will not work.

A second argument for having threads is that since they are lighter weight than processes, they are easier (i.e., faster) to create and destroy than processes. In many systems, creating a thread goes 10-100 times faster than creating a process. When the number of threads required changes dynamically and rapidly, this property is useful to have.

A third reason for having threads is also a performance argument. Threads yield no performance gain when all of them are CPU bound, but when there is substantial computing and also substantial I/O, having threads allows these activities to overlap, hence speeding up the application.

Finally, threads are useful on systems with multiple CPUs, where real parallelism is possible. We will come back to this issue in "MULTIPLE PROCESSOR SYSTEMS".

It is easiest to see why threads are helpful by looking at some solid examples. As a first example, consider a word processor. Word processors generally display the document being created on the screen formatted precisely as it will come out on the printed page. Particularly, all the line breaks and page breaks are in their correct and final positions, so that the user can examine them and change the document if need be (e.g.,  to eliminate widows and orphans-incomplete top and bottom lines on a page, which are considered esthetically unpleasing).

Assume that the user is writing a book. From the author's point of view, it is easiest to keep the entire book as a single file to make it easier to search for topics, carry out global substitutions, and so on. On the other hand, each chapter might be a separate file. However, having every section and subsection as a separate file is a real nuisance when global changes have to be made to the entire book, since then hundreds of files have to be individually edited. For instance, if proposed standard xxxx is approved just before the book goes to press, all occurrences of "Draft Standard xxxx" have to be changed to "Standard xxxx" at the last minute. If the entire book is one file, usually a single command can do all the substitutions. On the contrary, if the book is spread over 300 files, each one must be edited individually.

Now think what happens when the user suddenly deletes one sentence from page 1 of an 800-page document. After checking the changed page for correctness, he now wants to make another change on page 600 and types in a command telling the word processor to go to that page (possibly by searching for a phrase occurring only there). The word processor is now forced to reformat the whole book up to page 600 on the spot because it does not know what the first line of page 600 will be until it has processed all the previous pages. There may be a considerable delay before page 600 can be displayed, leading to an unhappy user.

Threads can help here. Suppose that the word processor is written as a two-threaded program. One thread interacts with the user and the other handles reformatting in the background. As soon as the sentence is deleted from page 1, the interactive thread tells the reformatting thread to reformat the entire book. Meanwhile, the interactive thread continues to listen to the keyboard and mouse and responds to simple commands like scrolling page 1 while the other thread is computing madly in the background. With a little luck, the reformatting will be completed before the user asks to see page 600, so it can be displayed immediately

As we are at it, why not add a third thread? Many word processors have a feature of automatically saving the entire file to disk every few minutes to protect the user against losing a day's work in the event of a program crash, system crash, or power failure. The third thread can manage the disk backups without interfering with the other two. The situation with three threads is shown in Fig. 1.

A word processor with three threads

If the program were single-threaded, then whenever a disk backup started, commands from the keyboard and mouse would be ignored until the backup was completed. The user would surely perceive this as slow performance. On the other hand, keyboard and mouse events could interrupt the disk backup, allowing good performance but leading to a complex interrupt-driven programming model. With three threads, the programming model is much simpler. The first thread just interacts with the user. The second thread reformats the document when told to. The third thread writes the contents of RAM to disk from time to time.

It should be clear that having three separate processes would not work here because all three threads need to operate on the document. By having three threads instead of three processes, they share a common memory and thus all have access to the document being edited.

A similar situation exists with many other interactive programs. For instance, an electronic spreadsheet is a program that allows a user to uphold a matrix, some of whose elements are data provided by the user. Other elements are computed based on the input data using potentially complicated formulas. When a user changes one element, many other elements may have to be recomputed. By having a background thread do the recomputation, the interactive thread can allow the user to make additional changes while the computation is going on. Likewise, a third thread can handle periodic backups to disk on its own.

Now consider yet another example of where threads are useful: a server for a World Wide Web site. Requests for pages come in and the requested page is sent back to the client. At most Web sites, some pages are more frequently accessed than other pages. For instance, Sony's home page is accessed far more than a page deep in the tree containing the technical specifications of any specific camcorder. Web servers use this fact to improve performance by maintaining a collection of heavily used pages in main memory to eliminate the need to go to disk to get them. Such a collection is called a cache and is used in many other contexts as well. We saw CPU caches in "INTRODUCTION", for example.

One way to organize the Web server is shown in Fig. 2(a). Here one thread, the dispatcher, reads incoming requests for work from the network. After examining the request, it chooses an idle (i.e., blocked) worker thread and hands it the request, possibly by writing a pointer to the message into a special word linked with each thread. The dispatcher then wakes up the sleeping worker, moving it from blocked state to ready state.

A multithreaded Web server

When the worker wakes up, it checks to see if the request can be satisfied from the Web page cache, to which all threads have access. If not, it starts a read operation to get the page from the disk and blocks until the disk operation completes. When the thread blocks on the disk operation, another thread is chosen to run, possibly the dispatcher, in order to acquire more work, or possibly another worker that is now ready to run.

This model allows the server to be written as a collection of chronological threads. The dispatcher's program consists of an endless loop for getting a  work request and handing it off to a worker. Each worker's code consists of an infinite loop consisting of accepting a request from the dispatcher and checking the Web cache to see if the page is present. If so, it is returned to the client, and the worker blocks waiting for a new request. If not, it gets the page from the disk, returns it to the client, and blocks waiting for a new request.

A rough outline of the code is given in Fig. 3. Here, as in the rest of this blog, TRUE is assumed to be the constant 1. Also, buf and page are structures suitable for holding a work request and a Web page, respectively.

A rough outline of the code for Fig. 2

Think how the Web server could be written in the absence of threads. One possibility is to have it operate as a single thread. The main loop of the Web server gets a request, inspects it, and carries it out to completion before getting the next one. While waiting for the disk, the server is idle and does not process any other incoming requests. If the Web server is running on a dedicated machine, as is generally the case, the CPU is simply idle while the Web server is waiting for the disk. The net result is that many fewer requests/sec can be processed. In this way, threads gain significant performance, but each thread is programmed in sequence, in the usual way.

So far we have seen two possible designs: a multithreaded Web server and a single-threaded Web server. Assume that threads are not available but the system designers find the performance loss due to single threading unacceptable. If a nonblocking version of the read system call is available, a third approach is possible. When a request comes in, the one and only thread examines it. If it can be satisfied from the cache, fine, but if not, a non blocking disk operation is started.

The server records the state of the current request in a table and then goes and gets the next event. The next event may either be a request for new work or a reply from the disk about a previous operation. If it is new work, that work is started. If it is a reply from the disk, the relevant information is fetched from the table and the reply processed. With nonblocking disk I/O, a reply probably will have to take the form of a signal or interrupt.

In this design, the "sequential process" model that we had in the first two cases is lost. The state of the computation must be clearly saved and restored in the table every time the server switches from working on one request to another. In effect, we are simulating the threads and their stacks the hard way. A design like this, in which each computation has a saved state, and there exists some set of events that can occur to change the state is called a finite-state machine. This concept is extensively used throughout computer science.

It should now be clear what threads have to offer. They make it possible to retain the idea of in-order processes that make blocking system calls (e.g., for disk I/O) and still achieve parallelism. Blocking system calls make programming easier, and parallelism improves performance. The single-threaded server retains the simplicity of blocking system calls but gives up performance. The third approach achieves high performance through parallelism but uses nonblocking calls and interrupts and is therefore, hard to program. These models are summarized in Fig. 4.

Three ways to construct a server

A third example where threads are helpful is in applications that must process very large amounts of data. The normal approach is to read in a block of data, process it, and then write it out again. The problem here is that if only blocking system calls are available, the process blocks while data are coming in and data are going out. Having the CPU go idle when there is lots of computing to do is clearly wasteful and should be avoided if possible.

Threads offer a solution. The process could be structured with an input thread, a processing thread, and an output thread. The input thread reads data into an input buffer. The processing thread takes data out of the input buffer, processes them, and puts the results in an output buffer. The output buffer writes these results back to disk. Thus, input, output, and processing can all be going on at the same time. Certainly, this model only works if a system call blocks only the calling thread, not the whole process.


address space, threads, system calls