Pages

Wednesday, 25 September 2013

Two-Level Directory

As we have seen, a single-level directory often leads to confusion of file names among different users. The standard solution is to create a separate directory for each user.
In the two-level directory structure, each user has his own user file directory (LTD). The UFDs have similar structures, but each lists only the files of a single user. When a user job starts or a user logs in, the system's master file directory (MFD) is searched. The MFD is indexed by user name or account number, and each entry points to the UFD for that user.
When a user refers to a particular file, only his own UFD is searched. Thus, different users may have files with the same name, as long as all the file names within each UFD are unique. To create a file for a user, the operating system searches only that user's UFD to ascertain whether another file of that name exists. To delete a file, the operating system confines its search to the local UFD; thus, it cannot accidentally delete another user's file that has the same name.
The user directories themselves must be created and deleted as necessary. A special system program is run with the appropriate user name and account information. The program creates a new UFD and adds an entry or it to the PvlFD. The execution of this program might be restricted to system administrators. Although the two-level directory structure solves the name-collision problem, it still has disadvantages. This structure effectively isolates one user from another. Isolation is an advantage wrhen the users are completely independent but is a disadvantage when the users want to cooperate on some task and to access one another's files. Some systems simply do not allow local user files to be accessed by other users.
If access is to be permitted, one user must have the ability to name a file in another user's directory. To name a particular file uniquely in a two-level directory, we must give both the user name and the file name. A two-level directory can be thought of as a tree, or an inverted tree, of height 2. The root of the tree is the MFD. Its direct descendants are the UFDs. The descendants of the UFDs are the files themselves. The files are the leaves of the tree. Specifying a user name and a file name defines a path in the tree from the root (the MFD)
to a leaf (the specified file). Thus, a user name and a file name define a path name. Every file in the system has a path name. To name a file uniquely, a user must know the path name of the file desired.
For example, if user A wishes to access her own test file named test, she can simply refer to test. To access the file named test of user B (with directory-entry name userb), however, she might have to refer to /userb/test. Every system has its own syntax for naming files in directories other than the user's own.
Additional syntax is needed to specify the volume of a file. For instance, in MS-DOS a volume is specified by a letter followed by a colon. Thus, a file specification might be C:\ userb\test. Some systems go even further and separate the volume, directory name, and file name parts of the specification. For
instance, in VMS, the file login.com might be specified as: u:[sst.jdeckllogin.com;l, where u is the name of the volume, sst is the name of the directory, jdeck is the name of the subdirectory, and 1 is the version number. Other systems simply treat the volume name as part of the directory name. The first name given is
that of the volume, and the rest is the directory and file. For instance, /u/pbg/test might specify volume it, directory pbg, and file test. A special case of this situation occurs with the system files. Programs provided
as part of the system—loaders, assemblers, compilers, utility routines, libraries, and so on—are generally defined as files. When the appropriate commands are given to the operating system, these files are read by the loader and executed. Many command interpreters simply treat such a command as the name of a file to load and execute. As the directory system is defined presently, this file name would be searched for in the current UFD. One solution would be to copy the system files into each UFD. However, copying all the system files would waste an enormous amount of space. (If the system files require 5 MB, then supporting 12 users would require 5 x 12 = 60 MB just for copies of the system files.) The standard, solution is to complicate the search procedure slightly. A special user directory is defined to contain the system files (for example, user 0). Whenever a file name is given to be loaded, the operating system first searches the local UFD. If the file is found, it is used. If it is not found, the system automatically searches the special user directory that contains the system files. The sequence of directories searched when a file is named is called the search path. The search path can be extended to contain an unlimited list of directories
to search when a command name is given. This method is the one most used in UNIX and MS-DOS. Systems can also be designed so that each user has his own search path.

Directory Structure

Up to this point, we have been discussing "a file system." In reality, systems may have zero or more file systems, and the file systems may be of varying types. For example, a typical Solaris system may have a few UFS file systems, a VFS file system, and some NFS file systems. The file systems of computers, then, can be extensive. Some systems store millions of files on terabytes of disk. To manage all these data, we need to organize them. This organization involves the use of directories. In this section, we explore the topic of directory structure. First, though, we explain some basic features of storage structure.

1. Storage Structure
A disk (or any storage device that is large enough) can be used in its entirety for a file system. Sometimes, though, it is desirable to place multiple file systems on a disk or to use parts of a disk for a file system and other parts for other things, such as swap space or unformatted (raw) disk space. These parts are known variously as partitions, slices, or (in the IBM world) minidisks. A file system can be created on each of these parts of the disk. As we shall see in the next chapter, the parts can also be combined to form larger structures known as volumes, and file systems can be created on these as well. For now, for clarity, we simply refer to a chunk of storage that holds a file system as a volume. Each volume can be thought of as a virtual disk. Volumes can also store multiple operating systems, allowing a system to boot and run more than one. Each volume that contains a file system must also contain information about the files in the system. This information is kept in entries in a device directory or volume table of contents. The device directory (more commonly known simply as a directory) records information—such as name, location, size, and type—for all files on that volume. Figure 10.6 shows a typical file-system organization.
2. Directory Overview
The directory can be viewed as a symbol table that translates file names into their directory entries. If we take such a view, we see that the directory itself can be organized in many ways. We want to be able to insert entries, to delete entries, to search for a named entry, and to list all the entries in the directory. In this section, we examine several schemes for defining the logical structure of the directory system. When considering a particular directory structure, we need to keep in mind the operations that are to be performed on a directory: 
  • Search for a file. We need to be able to search a directory structure to find the entry for a particular file. Since files have symbolic names and similar names may indicate a relationship between files, we may want to be able to find all files whose names match a particular pattern. 
  • Create a file. New files need to be created and added to the directory.
  •  Delete a file. When a file is no longer needed, we want to be able to remove it from the directory.
  •  List a directory. We need to be able to list the files in a directory and the contents of the directory entry for each file in the list.
  •  Rename a file. Because the name of a file represents its contents to its users, we must be able to change the name when the contents or use of the file changes. Renaming a file may also allow its position within the directory structure to be changed.
  •  Traverse the file system. We may wish to access every directory and every file within a directory structure. For reliability, it is a good idea to save the contents and structure of the entire file system at regular intervals. Often, we do this by copying all files to magnetic tape. This technique provides a backup copy in case of system failure. In addition, if a file is no longer in use., the file can be copied to tape and the disk space of that file released for reuse by another file.

In the following sections, we describe the most common schemes for defining
the logical structure of a directory.

3. Single-Level Directory

A single-level directory has significant limitations, however, when the number of files increases or when the system has more than one user. Since all files are in the same directory, they must have unique names. If two users call their data file test, then the unique-name rule is violated. For example, in one programming class, 23 students called the program for their second assignment progl; another 11 called i\ assign!. Although file names are generally selected to reflect the content of the file, they are often limited in length, complicating the
task of making file names unique. The MS-DOS operating system allows only 11-character file names; UNIX, in contrast, allows 255 characters.
Even a single user on a single-level directory may find it difficult to remember the names of all the files as the number of files increases. It is not uncommon for a user to have hundreds of files on one computer system and an equal number of additional files on another system. Keeping track of so many
files is a daunting task.


Access Methods

1. Sequential Access
The simplest access method is sequential access. Information in the file is processed in order, one record after the other. This mode of access is by far the most common; for example, editors and compilers usually access files in this fashion. Reads and writes make up the bulk of the operations on a file. A read operation—read next—reads the next portion of the file and automatically advances a file pointer, which tracks the I/O location. Similarly, the write operation—write next—appends to the end of the file and advances to the end of the newly written material (the new end of file). Such a file can be reset to the beginning; and on some systems, a program .may be able to skip forward or backward n records for some integer n—perhaps only for n = 1. Sequential access, which is depicted in Figure 10.3, is based on a tape model of a file and works as well on sequential-access devices as it does on random-access ones.
2. Direct Access
Another method is direct access (or relative access). A file is made up of fixedlength logical records that allow programs to read and write records rapidly in no particular order. The direct-access method is based on a disk model of a file, since disks allow random access to any file block. For direct access, the file is viewed as a numbered sequence of blocks or records. Thus, we may read block 14, then read block 53, and then write block 7. There are no restrictions on the order of reading or writing for a direct-access file. Direct-access files are of great use for immediate access to large amounts of information. Databases are often of this type. When a query concerning a particular subject arrives, we compute which block contains the answer and then read that block directly to provide the desired information.
As a simple example, on an airline-reservation system, we might store all the information about a particular flight (for example, flight 713) in the block identified by the flight number. Thus, the number of available seats for flight 713 is stored in block 713 of the reservation file. To store information about a larger set, such as people, we might compute a hash function on the people's names or search a small in-memory index to determine a block to read and search.
For the direct-access method, the file operations must be modified to include the block number as a parameter. Thus, we have read n, where n is the block number, rather than read next, and write n rather than write next. An alternative approach is to retain read next and write next, as with sequential access, and to add an operation position file to n, where n is the block number. Then, to effect a read n, we would, position to n and then read next.
The block number provided by the user to the operating system is normally a relative block number. A relative block number is an index relative to the beginning of the file. Thvis, the first relative block of the file is 0, the next is 1, and so on, even though the actual absolute disk address of the block may be 14703 for the first block and 3192 for the second. The use of relative block numbers allows the operating system to decide where the file should be placed (called the allocation problem, as discussed in Chapter 11) and helps to prevent the user from accessing portions of the file system that may not be part of her file. Some systems start their relative block numbers at 0; others start at 1.
How then does the system satisfy a request for record N in a file? Assuming we have a logical record length L, the request for record A/ is turned into an I/O request for L bytes starting at location L * (N) within the file (assuming the first record is N - 0). Since logical records are of a fixed size, it is also easy to read, write, or delete a record.
Not all operating systems support both sequential and direct access for files. Some systems allow only sequential file access; others allow only direct access. Some systems require that a file be defined as sequential or direct when it is created; such a file can be accessed only in a manner consistent with its declaration. We can easily simulate sequential access on a direct-access file by simply keeping a variable cp that defines our current position. Simulating a direct-access file on a sequential-access file, however,
is extremely inefficient and clumsy.
3. Other Access Methods
Other access methods can be built on top of a direct-access method. These methods generally involve the construction of an index for the file. The index, like an index in the back of a book, contains pointers to the various blocks. To find a record in the file, we first search the index and then use the pointer to access the file directly and to find the desired record.
For example, a retail-price file might list the universal product codes (UPCs) for items, with the associated prices. Each record consists of a 10-digit UPC and a 6-digit price, for a 16-byte record, if our disk has 1,024 bytes per block, we can store 64 records per block. A file of 120,000 records would occupy about 2,000 blocks (2 million bytes). By keeping the file sorted by UPC, we can define an index consisting of the first UPC in each block. This index would have 2,000 entries of 10 digits each, or 20,000 bytes, and thus could be kept in memory To find the price of a particular item, we can make a binary search of the index.
From this search, we learn exactly which block contains the desired record and access that block. This structure allows us to search a large file doing little I/O.
With large files, the index file itself may become too large to be kept in memory. One solution is to create an index for the index file. The primary index file would contain pointers to secondary index files, which would point to the actual data items. For example, IBM's indexed sequential-access method (ISAM) uses a small
master index that points to disk blocks of a secondary index. The secondary index blocks point to the actual file blocks. The file is kept sorted on a defined key. To find a particular item, we first make a binary search of the master index, which provides the block number of the secondary index. This block is read in, and again a binary search is used to find the block containing the desired record. Finally, this block is searched sequentially. In this way, any record can be located from its key by at most two direct-access reads.

Internal File Structure

Internally, locating an offset within a file can be complicated for the operating system. Disk systems typically have a well-defined block size determined by the size of a sector. All disk I/O is performed in units of one block (physical record), and all blocks are the same size. It is unlikely that the physical record size will exactly match the length of the desired logical record. Logical records may even vary in length. Packing a number of logical records into physical blocks is a common solution to this problem.
For example, the UNIX operating system defines all files to be simply streams of bytes. Each byte is individually addressable by its offset from the beginning (or end) of the file. In this case, the logical record size is 1 byte. The file system automatically packs and unpacks bytes into physical disk blocks—
say, 512 bytes per block—as necessary.
The logical record size, physical block size, and packing technique determine how many logical records are in each physical block. The packing can be done either by the user's application program or by the operating system. In either case, the file may be considered to be a sequence of blocks. All the basic I/O functions operate in terms of blocks. The conversion from logical records to physical blocks is a relatively simple software problem.
Because disk space is always allocated in blocks, some portion of the last block of each file is generally wasted. If each block were 512 bytes, for example, then a file of 1,949 bytes would be allocated four blocks (2,048 bytes); the last 99 bytes would be wasted. The waste incurred to keep everything in units
of blocks (instead of bytes) is internal fragmentation. All file systems suffer from internal fragmentation; the larger the block size, the greater the internal fragmentation.

File Operations

A file is an abstract data type. To define a file properly, we need to consider the operations that can be performed on files. The operating system can provide system calls to create, write, read, reposition, delete, and truncate files. Let's examine what the operating system must do to perform each of these six basic file operations. It should then be easy to see how other, similar operations, such as renaming a file, can be implemented.

» Creating a file. Two steps are necessary to create a file. First, space in the
file system must be found for the file. Second, an entry for the new file must be made in
the directory.
• Writing a file. To write a file, we make a system call specifying both the
name of the file and the information to be written to the file. Given the
name of the file, the system searches the directory to find the file's location.
The system must keep a write pointer to the location in the file where the
next write is to take place. The write pointer must be updated whenever a
write occurs.
• Reading a file. To read from a file, we use a system call that specifies the
name of the file and where (in memory) the next block of the file should
be put. Again, the directory is searched for the associated entry, and the
system needs to keep a read pointer to the location in the file where the
next read is to take place. Once the read has taken place, the read pointer
is updated. Because a process is usually either reading from or writing to
a file, the current operation location can be kept as a per-process currentfile-
position pointer. Both the read and write operations use this same
pointer, saving space and reducing system complexity.
» Repositioning within a file. The directory is searched for the appropriate
entry, and the current-file-position pointer is repositioned to a given value.
Repositioning within a file need not involve any actual I/O. This file
operation is also known as a file seek.
• Deleting a file. To delete a file, we search the directory for the named file.
Having found the associated directory entry, we release all file space, so
that it can be reused bv other files, and erase the directory entry.
• Truncating a file. The user may want to erase the contents of a file but
keep its attributes. Rather than forcing the user to delete the file and then
recreate it, this function allows all attributes to remain unchanged—except
for file length—but lets the tile be reset to length zero and its file space
released.

These six basic operations comprise the minimal set of required file
operations. Other common operations include appending new information
to the end of an existing file and renaming an existing file. These primitive
operations can then be combined to perform other file operations. For instance,
we can create a copy of a file, or copy the file to another I/O device, such as
a printer or a display, by creating a new file and then reading from the old
and writing to the new. We also want to have operations that allow a user to
get and set the various attributes of a file. For example, we may want to have
operations that allow a user to determine the status of a file, such as the file's
length, and to set file attributes, such as the file's owner.
Most of the file operations mentioned involve searching the directory for
the entry associated with the named file. To avoid this constant searching, many
systems require that an openO system call be made before a file is first used
actively. The operating system keeps a small table, called the open-file table,
containing information about all open files. When a file operation is requested,
the file is specified via an index into this table, so no searching is required.
When the file is no longer being actively used, it is closed by the process, and
the operating system removes its entry from the open-file table, create and
delete are system calls that work with closed rather than open files.
Some systems implicitly open a file when the first reference to it is made.
The file is automatically closed when the job or program that opened the
file terminates. Most systems, however, require that the programmer open a
file explicitly with the openO system call before that file can be used. The
openO operation takes a file name and searches the directory, copying the
directory entry into the open-file table. The openO call can also accept accessmode
information—create, read-only, read—write, append-only, and so on.
This mode is checked against the file's permissions. If the request mode is
allowed, the file is opened for the process. The openO system call typically
returns a pointer to the entry in the open-file table. This pointer, not the actual
file name, is used in all I/O operations, avoiding any further searching and
simplifying the system-call interface.
The implementation of the openO and close() operations is more
complicated in an environment where several processes may open the file at
the same time. This may occur in a system where several different applications
open the same file at the same time. Typically, the operating system uses two
levels of internal tables: a per-process table and a system-wide table. The perprocess
table tracks all files that a process has open. Stored in this table is
information regarding the use of the file by the process. For instance, the
current file pointer for each file is found here. Access rights to the file and
accounting information can also be included.
Each entry in the per-process table in turn points to a system-wide open-file
table. The system-wide table contains process-independent information, such
as the location of the file on disk, access dates, and file size. Once a file has been
opened by one process, the system-wide table includes an entry for the file.
When another process executes an openQ call, a new entry is simply added
to the process's open-file table pointing to the appropriate entry in the systemwide
table. Typically., the open-file table also has an open count associated with
each file to indicate how many processes have the file open. Each close 0
decreases this open count, and when the open count reaches zero, the file is no
longer in use, and the file's entry is removed from the open-file table.
In summary, several pieces of information are associated with an open file.
• File pointer. On systems that do not include a file offset as part of the
readO and write () system calls, the system must track the last readwrite
location as a current-file-position pointer. This pointer is unique to
each process operating on the file and therefore must be kept separate from
the on-disk file attributes.
• File-open count. As files are closed, the operating system must reuse its
open-file table entries, or it could run out of space in the table. Because
multiple processes may have opened a file, the system must wait for the
last file to close before removing the open-file table entry. The file-open
counter tracks the number of opens and closes and reaches zero on the last
close. The system can then remove the entry.
• Disk location of the file. Most file operations require the system to modify
data within the file. The information needed to locate the file on disk is
kept in memory so that the system does not have to read it from disk for
each operation.
• Access rights. Each process opens a file in an access mode. This information
is stored on the per-process table so the operating system can allow or deny
subsequent I/O requests.
Some operating systems provide facilities for locking an open file (or
sections of a file). File "locks allow one process to lock a file and prevent other
processes from gaining access to it. File locks are useful for files that are shared
by several processes—for example, a system log file that can be accessed and
modified by a number of processes in the system.

Thursday, 19 September 2013

The Java Virtual Machine

Java is a popular object-oriented programming language introduced by Sun
Microsystems in 1995. In addition to a language specification and a large API
library, Java also provides a specification for a Java virtual machine—or JVM.
Java objects are specified with the class construct; a Java program
consists of one or more classes. For each Java class, the compiler produces
an architecture-neutral bytecode output (.class) file that will run on any
implementation of the JVM.
The JVMis a specification for an abstract computer. It consists of a class
loader and a Java interpreter that executes the architecture-neutral bytecodes,
as diagrammed in Figure 2.17. The class loader loads the compiled . class
files from both the Java program and the Java API for execution by the Java
interpreter. After a class is loaded, the verifier checks that the . class file is
valid Java bytecode and does not overflow or underflow the stack. It also
ensures that the bytecode does not perform pointer arithmetic, which could
provide illegal memory access. If the class passes verification, it is run by the
Java interpreter. The JVM also automatically manages memory by performing
garbage collection—the practice of reclaiming memory from objects no longer
in use and returning it to the system. Much research focuses on garbage
collection algorithms for increasing the performance of Java programs in the
virtual machine.
The JVM may be implemented in software on top of a host operating
system, such as Windows, Linux, or Mac OS X, or as part of a web browser.
Alternatively, the JVM may be implemented in hardware on a chip specifically
designed to run Java programs. If the JVM is implemented in software, the
Java interpreter interprets the bytecode operations one at a time. A faster
software technique is to use a just-in-time (JIT) compiler. Here, the first time a
Java method is invoked, the bytecodes for the method are turned into native
machine language for the host system. These operations are then cached so that
subsequent invocations of a method are performed using the native machine
instructions and the bytecode operations need not be interpreted all over again.
A technique that is potentially even faster is to run the JVM in hardware on a
special Java chip that executes the Java bytecode operations as native code, thus
bypassing the need for either a software interpreter or a just-in-time compiler.


Modules

Perhaps the best current methodology for operating-system design involves
using object-oriented programming techniques to create a modular kernel.
Here, the kernel has a set of core components and dynamically links in
additional services either during boot time or during run time. Such a
strategy uses dynamically loadable modules and is common in modern
implementations of UNIX, such as Solaris, Linux, and Mac OS X. For example, the
Solaris operating system structure, shown in Figure 2.13, is organized around
a core kernel with seven types of loadable kernel modules:
1. Scheduling classes
2. File systems
3. Loadable system calls
4. Executable formats
5. STREAMS modules
6. Miscellaneous
7. Device and bus drivers
Such a design allows the kernel to provide core services yet also allows
certain features to be implemented dynamically. For example, device and
bus drivers for specific hardware can be added to the kernel, and support
for different file systems can be added as loadable modules. The overall
result resembles a layered system in that each kernel section has defined,
protected interfaces; but it is more flexible than a layered system in that any
module can call any other module. Furthermore, the approach is like the
microkernel approach in that the primary module has only core functions
and knowledge of how to load and communicate with other modules; but it
is more efficient, because modules do not need to invoke message passing in
order to communicate.
The Apple Macintosh Mac OS X operating system uses a hybrid structure.
Mac OS X (also known as Danvin) structures the operating system using a
layered technique where one layer consists of the Mach microkernel. The
structure of Mac OS X appears in Figure 2.14.
The top layers include application environments and a set of services
providing a graphical interface to applications. Below these layers is the kernel
environment, which consists primarily of the Mach microkernel and the BSD
kernel. Mach provides memory management; support for remote procedure
calls (RPCs) and interprocess communication (IPC) facilities, including message
passing; and thread scheduling. The BSD component provides a BSD command
line interface, support for networking and file systems, and an implementation
of POSIX APIs, including Pthreads. In addition to Mach and BSD, the kernel
environment provides an I/O kit for development of device drivers and
dynamically loadable modules (which Mac OS X refers to as kernel extensions).
As shown in the figure, applications and common services can make use of
either the Mach or BSD facilities directly.