Tuesday, February 19, 2013

File Attributes

The common characteristics of all files are called the attributes of a file. They are as follows:-
  • Name: A filename is a string (i.e., a sequence of characters) that is used to identify a file. File names in Linux can contain any characters other than - a forward slash ( / ) and the null character. The forward slashis reserved for use as the name of the root directory (i.e., the directory that contains all other directories and files) and as a directory separator. The null character is used to terminate segments of text. Spaces are permitted, although they are best avoided because they can be incompatible with legacy software in some cases. Typically, however, file names only use alphanumeric characters (mostly lower case), underscores, hyphens and periods. Other characters, such as dollar signs, percentage signs and brackets, have special meanings to the shell(they are used as shell metacharacters).File names should never begin with a hyphen.A relatively small number of file names on a system consist only of upper case characters, such as README, INSTALL, NEWS and AUTHORS. They are usually plain text files that come bundled with programs and are for documentation purposes.File names were limited to 14 bytes (i.e 14 characters) in early UNIX systems. However, modern Unix-like systems support long file names, usually up to 255 bytes in length. File names can be as short as a single character.File names consist of two parts: a user-designated name and an extension which is determined by the type of file. The two are separated by a period.Although Unix-like operating systems generally do not require the use of file extensions, they can be convenient and useful. In particular, they can make it easy to identify file types at a glance and to facilitate manipulating groups of files. Files can also have multiple extensions.File names must be unique within a directory. However, multiple files and directories with the same name can reside in different directories because such files will have different absolute pathnames (i.e., locations relative to the root directory), and thus the system will be able to distinguish them.On a Unix-like operating system any file or directory can have multiple names because of the operating system's use of inodes instead of names to identify files and directories. Additional names can be provided by using the ln command to create one or more hard links to a file or directory.
  • Location: As files have places inUnix, therefore, paths need to be specified to define the location of a file. There are two methods of specifying paths. The absolute path is a sequence of directories from the root to the directory in which the file is housed. Each directory in the path is separated by a forward slash (/). A relative path is a sequence of directories from any of the directories above the current file and is a part of its parent tree. A path is different from a pathname as a pathname always ends up in an ordinary file whereas a path ends in a directory file. Consider the following directory structure: unix image1 The absolute path for the file file2 is /usr1/a1.dir and the pathname is /usr/a1.dir/file2. The relative path for file 2 is simply a1.dir where the current working directory is usr1 . The relative pathname is a1.dir/file2.
  • Size of a file: This is given in terms of bytes and is limited by the physical capacity of the disk and the disk qoutas assigned by the superuser.
  • Link Count: The link count of a file is a special number attached with each file which counts the number of files with different pathnames that access the same physical file. Links are implemented as hard or soft. (Links will be discussed in the next post).
  • i-node number: It is a unique number that the kernel assigns to each file for identification. An inode structure is defined per file , where inode- number is one of its prime members along with attributes like size of file, ownership, permissions , link count, address of disk blocks etc.
  • Time stamps: There are three time stamps attached to each file :-
    • Time of creation
    • Time of last modification
    • Time of last access
    These time stamps are automatically updated by the Unix system but they can be modified by the user with the help of commands like 'touch'.
These were the file attributes typically assosciated with each file. We shall discuss them in greater detail along with the assosciated commands when we discuss the topic 'Handling File Attributes'. The next post will discuss hard links and soft links.

Friday, February 15, 2013

Types of files in Unix

There are four types of files in Unix. They are:-
  1. Regular or Ordinary Files
  2. Directory Files
  3. Device Files
  4. FIFO Files

Regular Files:

These files usually consist of a sequential series of bytes and occupy space on the disk. They always form a leaf node in the tree hierarchy. They have no fixed format. Their format and structure is totally dictated by the utility that creates and accesses such files.These files can be created using editors like ed,ex, vi or from the standard input using 'cat>filename' command. An ordinary file can be a text file from a word processing package, a program written in any language of the shell etc.

Directory Files:

A directory is just a tabular collection of files and subdirectories. The contents of a directory can be any number of ordinary files, device files and directory files. A directory file can be thought of as the branch of the Unix file system tree.Each directory has a name of normally upto 14 characters. Two or more files can have the same name if they are in different directories. It is only the kernel which can write a directory file. It is the kernel , which updates the corresponding directory file whenever a user adds or deletes a file from it. A user can only create directories, add or delete files from a directory ,or delete the directory itself.All the directories created by the user reside in the home directory of the user. The home directoryis assigned to the user when they are assigned a recognised login name. The user has complete control over the home directory; no one else except a privileged user can read or write files in it without the user's permission. The Unix system also maintains several directories for its own use.The structure of these directories is much the same on all Unix systems. These directories which include several important system directories, are located directly under the root directory.The root directory (designated by /) is the source of the Unix file structure; all directories and files are arranged hierarchically under it.

Device Files:

All device files(i.e terminals , printers and other hardware) in Unix are called special files. This is the key to providing device independence. The system reads and writes to special files in the same way it does to ordinary files.However, the system read and write requests to do not activate the normal file access mechanisms; instead they activate the device handler assosciated with the file. The advantage of having devices as files is that you don't need any special commands or function to access devices. The output directed to such a file will be automatically redirected to the respective physical device associated with the filename. The kernel implements this by mapping special filenames to their respective devices.For example: the user could direct the output of a process or a file-listing to the printer by using the printer name in a command.Similarly , if the user wants to print a file, the system would simply write the file to the printer device file just as it would write to any other file. This allows you to control access to any device very effectively. All the Unix device files are stored in a directory called /dev, which resides immediately below the root directory. Devices in the Unix file system can also be thought of as leaves in the Unix file system tree.The special files associated with devices are not really files but pointers that point to the device drivers located in the kernel that handles data flow.Since the devices to be handled are addressed as files, the protection that is applicable to files is also applicable to devices. There are two types of device files - block special files and character special files. Block special files are for disk and tape devices.Block devices are divided into blocks (which are units of storage on the hard disk) . These blocks are used to store i-node entries of the files. i-nodes dictate the position of blocks through a process called indirect addressing. Character special files are for terminals ,printers , RAM etc. A character device is simply a device from which characters are read. They do not require the data to be buffered before it is read . /dev/kmem and /dev/mem are character special files for main memory.

FIFO(First In First Out) Files:

FIFO files let unrelated file communicate with each other. These files are typically used in applications where the communication path is one-way only and where a number of processes have to communicate with a single process, often called the daemon process . Each process writes a message to the FIFO file , and the Unix system guarantees that another user will not overwrite each individual message written on the file. Even though Unix treats all files similarly, files are categorized into the above three types because the files attributes associated with each file is different for each category. File attributes are common to all files with slightly different interpretations.In the next post we shall discuss these file attributes.

Wednesday, February 13, 2013

The Unix File System

The Unix file system is a hierarchical, tree- structured name space that is designed to help users organize and access files.The namespace consists of directories that hold files.The tree-structure consists of a root directory and branching subdirectories. Each subdirectory can have its own directories. Each directory may contain its own files. Every type of data in Unix is arranged in files , therefore it is essential to understand the Unix file system ;first on a logical level and then as its structure on the physical device. Unix treats everything it knows and understands as a file. A file to Unix is an array of bytes and its size is simply equal to the number of bytes present in it. Unix treats even directories and devices as files. A file can contain text , object code or a directory structure. The dominant file type is text which contains a series of bytes, sometimes with a suitable delimiter to separate fields. In Unix, all files have places and are collectively arranged as a top-down tree structure.The Unix file system provides a logical method of organising , retrieving and managing information.

Features of a Unix file System:

  • Hierarchical Structure:The Unix system groups all files under another type of files called directories.The whole structure is organized as an upside down hierarchical tree structure. Thus, all files always have a parent file apart from one directory file called the root directory. The root directory is the parent of all the files on the system. The greatest advantage of arranging files in a hierarchical structure is the dynamic flexibility of adding and deleting files at any level. It also improves access time. It was the Unix operating system which introduced the hierarchical file structure.
  • Structureless/ featureless files: Unix imposes no format or structure on the contents of a file.Any format imposed is by the utility that creates the files and not the Unix operating system. The Unix operating system treats all files simply as an array of bytes. A Unix file does not contain an extension . for e.g. A C program written on Unix does not contain a .c extension. The .c extension is added later due to the requirements imposed by the C compiler.
  • Dynamic File Expansion:The size of a file is not restricted by any rule other than amount of disk-storage space available. The sizes of the files and the number of files can be dynamically modified.
  • Security: Unix provides security at various levels. Files are protected using file ownership mechanisms. All users are divided into three groups and each group is granted access or denied access in terms of read, write and execute permissions for each file separately.
  • Device Independence: As devices and directories are both treated as files in Unix, therefore, a user is protected from knowing the details of device related operations and procedures.
In the next post we shall continue with the discussion of the Unix file system.Specifically we will discuss the three types of files supported in the logical view of the Unix file system.