All computer applications need to store and get back information. While a process is running, it can store a limited amount of information  within its own address space. On the other hand, the storage capacity is restricted to the size of the virtual address space. For some applications this size is adequate, but for others, such as airline reservations, banking, or corporate record keeping, it is far too small.

A second problem with keeping information within a process address space is that when the process terminates, the information is lost. For  many applications, (e.g., for databases), the information must be retained for weeks, months, or even forever. Having it vanish when the  process using it terminates is unacceptable. Moreover, it must not go away when a computer crash kills the process.

A third problem is that it is often necessary for multiple processes to access (parts of) the information at the same time. If we have an  online telephone directory stored inside the address space of a single process, only that process can access it. The way to solve this  problem is to make the information itself independent of any one process.

Therefore we have three essential requirements for long-term information storage:

1 . It must be possible to store a very large amount of information.

2. The information must survive the termination of the process using it.

3. Multiple processes must be able to access the information concurrently.

Magnetic disks have been used for years for this long-term storage. Tapes and optical disks are also used, but they have much lower  performance. We will study disks more in “INPUT/OUTPUT”, but for the moment, it is sufficient to think of a disk as a linear sequence of fixed-size blocks and supporting two operations:

1. Read block k.
2. Write block k

In fact there are more, but with these two operations one could, in principle, solve the long-term storage problem.

On the other hand, these are very inconvenient operations, particularly on large systems used by many applications and possibly multiple users (e.g., on a server). Just a few of the questions that quickly arise are:

1. How do you find information?
2. How do you keep one user from reading another user's data?
3. How do you know which blocks are free?

and there are many more.

Just as we saw how the operating system abstracted away the concept of the processor to create the abstraction of a process and how it abstracted away the concept of physical memory to offer processes (virtual) address spaces, we can solve this problem with a new abstraction: the file. Together, the abstractions of processes (and threads), address spaces, and files are the most important concepts relating to operating systems. If you really understand these three concepts from beginning to end, you are well on your way to becoming an operating systems expert.

Files are logical units of information created by processes. A disk will generally contains thousands or even millions of them, each one independent of the others. In reality, if you think of each file as a kind of address space, you are not that far off, except that they are used to model the disk instead of modeling the RAM.

Processes can read existing files and create new ones if need be. Information stored in files must be persistent, that is, not be affected by process creation and termination. A file should only disappear when its owner explicitly removes it. Although operations for reading and writing files are the most common ones, there exist many others, some of which we will examine in next articles.

Files are managed by the operating system. How they are structured, named, accessed, used, protected, implemented, and managed are major topics in operating system design. As a whole, that part of the operating system dealing with files is known as the file system and is the subject of this section.

From the user's standpoint, the most important aspect of a file system is how it appears, that is, what constitutes a file, how files are named and protected, what operations are allowed on files, and so on. The details of whether linked lists or bitmaps are used to keep track of free storage and how many sectors there are in a logical disk block are of no interest, although they are of great importance to the designers of the file system. Therefore, we have structured the section as several parts. The first two are concerned with the user interface to files and directories, respectively. Then comes a detailed discussion of how the file system is implemented and managed. Finally, we give some examples of real file systems.


address space, physical memory, file system