An Example Program Using File System Calls

An Example Program Using File System Calls

In this section we will consider a simple UNIX program that copies one file from its source file to a destination file. It is listed in Figure 1. The program has minimal functionality and even worse error reporting, but it gives a reasonable idea of how some of the system calls related to files work.

The program, copyfile, can be called, for instance, by the command line

copyfile abc xyz

to copy the file abc to xyz. If xyz already exists, it will be overwritten. Otherwise,

A simple program to copy a file

it will be created. The program must be called with exactly two arguments, both legal file names. The first is the source; the second is the output file.

The four # include statements near the top of the program cause a large number of definitions and function prototypes to be included in the program. These are required to make the program conformant to the relevant international standards, but will not concern us further. The next line is a function prototype for main, something required by ANSI C, but also not important for our purposes.

The first #define statement is a macro definition that defines the character string BUF_SIZE as a macro that expands into the number 4096. The program will read and write in chunks of 4096 bytes. It is considered good programming practice to give names to constants like this and to use the names instead of the constants. Not only does this convention make programs easier to read, but it also makes them easier to maintain. The second #define statement determines who can access the output file.

The main program is called main, and it has two arguments, argc and argv. These are supplied by the operating system when the program is called. The first one tells how many strings were present on the command line that invoked the program, including the program name. It should be 3. The second one is an array of pointers to the arguments. In the example call given above, the elements of this array would contain pointers to the following values:

argv[0] = "copyfile"
argv[1] = "abc"
argv[2] = "xyz"

It is via this array that the program accesses its arguments.

Five variables are declared. The first two, in_fd and out_fd, will hold the file descriptors, small integers returned when a file is opened. The next two, rd_count and wt_count, are the byte counts returned by the read and write system calls, respectively. The last one, buffer, is the buffer used to hold the data read and supply the data to be written.

The first actual statement checks argc to see if it is 3. If not, it exits with status code 1. Any status code other than 0 means that an error has occurred. The status code is the only error reporting present in this program. A production version would normally print error messages as well.

Then we try to open the source file and create the destination file. If the source file is successfully opened, the system assigns a small integer to in_fd, to identify the file. Subsequent calls must contain this integer so that the system knows which file it wants. Likewise, if the destination is successfully created, out_fd is given a value to identify it. The second argument to create sets the protection mode. If either the open or the create fails, the corresponding file descriptor is set to -1, and the program exits with an error code.

Now comes the copy loop. It starts by trying to read in 4 KB of data to buffer. It does this by calling the library procedure read, which in fact invokes the read system call. The first parameter identifies the file, the second gives the buffer, and the third tells how many bytes to read. The value assigned to rd_count gives the number of bytes actually read. Normally, this will be 4096, except if fewer bytes are remaining in the file. When the end of file has been reached, it will be 0. If rd_count is ever zero or negative, the copying cannot continue, so the break statement is executed to terminate the (otherwise endless) loop.

The call to write outputs the buffer to the destination file. The first parameter identifies the file, the second gives the buffer, and the third tells how many bytes to write, similar to read. Note that the byte count is the number of bytes actually read, not BUF_SIZE. This point is important because the last read will not return 4096 unless the file just happens to be a multiple of 4 KB.

When the entire file has been processed, the first call beyond the end of file will return 0 to rd_count, which will make it exit the loop. At this point the two files are closed and the program exits with a status indicating normal termination.

Although the Windows system calls are different from those of UNIX, the general structure of a command-line Windows program to copy a file is moderately similar to that of Figure 1.


Tags

system call, file descriptor, operating system