System Calls for Process Management

System Calls for Process Management

The first group of calls in figure (a). deals with process management. Fork is a good place to start the discussion. Fork is the only way to create a new process in POSIX. It produces an exact duplicate of the original process, including all the file descriptors, registers - everything.

Some of the major POSIX system calls.

After the fork, the original process and the copy (the parent and child) go their separate ways. All the variables have the same values at the time of the fork, but since the parent's data are copied to create the child, subsequent changes in one of them do not affect the other one. (The program text, which is unchangeable, is shared between parent and child.) The fork call returns a value, which is zero in the child and equal to the child's process identifier or PID in the parent. Using the returned PID, the two processes can see which one is the parent process and which one is the child process.

In many cases, after a fork, the child will need to carry out different code from the parent. Think about the case of the shell. It reads a command from the terminal, forks off a child process, waits for the child to execute the command, and then reads the next command when the child terminates. To wait for the child to finish, the parent executes a waitpid system call, which just waits until the child terminates (any child if more than one exists). Waitpid can wait for a particular child, or for any old child by setting the first parameter to -1. When waitpid completes, the address pointed to by the second parameter, statloc, will be set to the child's exit status (normal or abnormal termination and exit value). Various options are also provided, specified by the third parameter.

Now consider how fork is used by the shell. When a command is typed, the shell forks off a new process. This child process must carry out the user command. It does this by using the execve system call, which causes its complete core image to be replaced by the file named in its first parameter. (In fact, the system call itself is exec, but various library procedures call it with different parameters and slightly different names. We will treat these as system calls here.) A highly simplified shell illustrating the use of fork, waitpid, and execve is shown in figure (b).

A stripped-down shell. TRUE is assumed to be defined as l

In the most common case, execve has three parameters: the name of the file to be executed, a pointer to the argument array, and a pointer to the environment array. These will be explained shortly. Many library routines, including execl, execv, execle, and execve, are provided to allow the parameters to be omitted or specified in various ways. Throughout this blog we will use the name exec to represent the system call invoked by all of these. Let us consider the case of a command such as

cp file1 file2

used to copy file1 to file2. After the shell has forked, the child process locates and executes the file cp and passes to it the names of the source and target files. The main program of cp (and main program of most other C programs) contains the declaration

main(argc, argv, envp)

where argc is a count of the number of items on the command line, including the program name. For the example above, argc is 3.

The second parameter, argv, is a pointer to an array. Element i of that array is a pointer to the i-th string on the command line. In our example, argv[0] would point to the string "cp", argv[1] would point to the string "file1" and argv[2] would point to the string "file2".

The third parameter of main, envp, is a pointer to the environment, an array of strings containing assignments of the form name = value used to  pass information such as the terminal type and home directory name to programs. There are library procedures that programs can call to get the environment variables, which are often used to customize how a user wants to carry out certain tasks (e.g., the default printer to use). In figure (b), no environment is passed to the child, so the third parameter of execve is a zero.

If exec seems complex, do not despair; it is (semantically) the most complicated of all the POSIX system calls. All the other ones are much simpler. As an example of a simple one, consider exit, which processes should use when they are finished executing. It has one parameter, the exit status (0 to 255), which is returned to the parent via statloc in the waitpid system call.

Processes in UNIX have their memory divided up into three segments: the text segment (i.e., the program code), the data segment (i.e., the  variables), and the stack segment. The data segment grows upward and the stack grows downward, as shown in figure (c). Between them is a gap of unused address space. The stack grows into the gap automatically, as required, but expansion of the data segment is done explicitly by using a system call, brk, which specifies the new address where the data segment is to end. This call, however, is not described by the POSIX  standard, since programmers are encouraged to use the malloc library procedure for dynamically assigning storage, and the underlying  implementation of malloc was not thought to be a suitable subject for standardization since few programmers use it directly and it is doubtful that anyone even notices that brk is not in POSIX.

Processes have three segments: text, data, and stack


Tags

system calls, memory, process management