Chapter 41. Runtime Filesystem Organization

This section covers the organization of the file system at runtime. MMFS is divided into a number of modules, each of which covers a specific area of functionality. The following sections cover these in detail.

41.1. FILEIO Interface

This module provides the interface to the FILEIO package to present a standard file system interface. This is achieved by exporting a filesystem table entry for the "mmfs" filesystem type. In fact two filesystems are exported, "mmfs" and "mmfs.format". These behave identically except that "mmfs.format" causes the filesystem to be reformatted as part of the mount operation.

41.2. File and Directory Handling

The directory module supports operations on the directory. It provides support for searching the directory for a given file, creation and deletion of entries and renaming entries.

A small cache of directory entries, called dirnodes, is maintained. This allows separate opens of the same file to share the directory entry and other information.

To allow easy location of unused directory entries, and to avoid searching free entries, the module maintains a bitmap of which directory entries are allocated. This map is constructed during the initial scan and maintained as entries are added and removed.

The file module supports the creation, deletion, reading and writing of the contents of a file. The standard file IO operations are supported together with streaming access. Each open file is accessed through a file object, which is also maintained by this module.

The block freelist is also managed by the file module, as is a bitmap recording the allocation state of all the BATs.

41.3. Caches

The filesystem has two caches. The metadata cache is used to cache portions of the directory, freelist and BATs. The data cache is used to contain blocks of file data. The two caches are identical other than that the metadata cache uses small (typically 4KiB) segments, while the data cache operates in terms of whole filesystem blocks. The caches also cause disk transfers originated from different caches to have different priorities.

The cache module exports a variety of functions for reading and writing directory entries in the directory, block numbers in the freelist and BATs, and for accessing file data. These functions perform the necessary translations into sector addresses and access the appropriate cache.

41.4. Disk Interface

The disk interface module provides support for handling transfers to and from the disk. It consists of a priority ordered queue of block descriptors plus a thread that picks the first descriptor off the queue and submits it to the disk device driver. The block descriptors used by the disk module are the same as those used by the caches.

41.5. Scan and Format

When a filesystem is mounted it performs a startup scan to determine the format of the disk and fix up any problems caused by any unexpected failures. The scan goes through the following steps:

  • Scan the freelist looking for the head and tail offsets. Each block seen in the freelist is also recorded as having been seen and as being free.
  • Scan the directory. For each entry, check that its checksum is correct. If not, mark the entry empty and correct the checksum. For each file, if it is in CREATING state, complete the operation by ensuring that each block in the BAT is not also in the freelist and changing its state to CREATED. If it is in DELETING state, complete the operation by returning all the blocks in its BAT to the freelist and deleting the directory entry.
  • Scan the BATs of all files, recording that they have been seen and checking that they are not also in the freelist. Any block that is both in the freelist and a BAT is removed from the freelist.
  • If any blocks have not yet been seen, then these orphaned blocks are inserted into the freelist.
  • If any of the previous steps have updated the freelist, then the on-disk data structure is rebuilt. This has the side effect of sorting the freelist into block order, improving performance in future.

If the scan finds that the disk is corrupt or unformatted, or the filesystem has been mounted using the "mmfs.format" filesystem, then the disk is reformatted. Formatting consists of zeroing the directory and all the BATs, and building the freelist with all the blocks in the data area. Finally a volume label is written to the first entry in the directory.