Chapter 42. Configuration

42.1. Configuration Options

The format and footprint of the filesystem are controlled by a number of configuration options, described in the following sections.

42.1.1. General Options

The following options define the version and revision of the filesystem.

CYGNUM_FS_MMFS_VERSION

This is the version of the filesystem supported.

Default value: 1

CYGNUM_FS_MMFS_REVISION

This is the revision of the filesystem supported.

Default value: 0

42.1.2. Formatting Options

These options control the formatting of an MMFS disk. They are only used when a filesystem is formatted. Under normal circumstances the filesystem will fetch these values from the disk volume label.

CYGNUM_FS_MMFS_BLOCK_SIZE

This option defines the size of filesystem blocks. The value is defined in KiB and must be a power of 2.

Default value: 256

CYGNUM_FS_MMFS_ROOTDIR_SIZE

This option defines the size of the root directory in blocks. Since all files are contained in this directory, its size gives a hard limit to the number of files that the filesystem may contain.

Default value: 1

CYGNUM_FS_MMFS_BAT_SIZE

This option defines the size of the Block Allocation Tables used to store the addresses of file data blocks. This gives a hard upper limit on the size of a file.

Default value: 2

CYGNUM_FS_MMFS_BAT_COUNT

This option defines the number of BATs allocated in the filesystem. The default is to define 200 BATs.

Default value: 200

42.1.3. Footprint Options

These options control the memory footprint and other parameters for an active filesystem.

CYGNUM_FS_MMFS_FILE_COUNT

This option defines the maximum number of open files supported by the filesystem. This depends on the expected number of data streams, plus any random access files, that may be open simultaneously.

Default value: 4

CYGNUM_FS_MMFS_DIRNODE_COUNT

This option defines the maximum number of cached directory entries. At least one is required for each open file, plus a few for handling other filesystem operations such as renaming or deleting.

Default value: CYGNUM_FS_MMFS_FILE_COUNT+4

CYGNUM_FS_MMFS_MULTI_BUFFER

This defines the level of per-file multi-buffering. During streaming the filesystem will read ahead and write behind by this number of data blocks.

Default value: 2

CYGNUM_FS_MMFS_DATA_CACHE_SIZE

This defines the amount of memory occupied by the data cache. The default value is calculated from the multi-buffering level, and the number of files.

Default value: (CYGNUM_FS_MMFS_MULTI_BUFFER+2) * CYGNUM_FS_MMFS_FILE_COUNT * CYGNUM_FS_MMFS_BLOCK_SIZE

CYGNUM_FS_MMFS_META_CACHE_SIZE

This defines the amount of memory occupied by the metadata cache. The default value is calculated from the number of files plus an overhead to support the freelist and directory scanning.

Default value: (CYGNUM_FS_MMFS_FILE_COUNT+6) * CYGNUM_FS_MMFS_META_BLOCK_SIZE

CYGNUM_FS_MMFS_META_BLOCK_SIZE

This defines the size of a metadata cache block. These are used to contain portions of the the directory, freelist and BATs.

Default value: 4

CYGNUM_FS_MMFS_DISKIO_PRIORITY

This defines the priority of the disk IO thread. This thread should generally run at a high priority since it does very little but it is vital to the performance of the filesystem.

Default value: 4

CYGNUM_FS_MMFS_FLUSH_INTERVAL

This defines the interval at which metadata cache blocks are flushed. Each time this interval expires the oldest dirty block in the cache is written to disk. This allows dirty data to be trickled out to disk without severely impacting streaming transfers.

Default value: 10

CYGNUM_FS_MMFS_FLUSH_PERIOD

This defines the cache flush period in multiples of the cache flush interval. Each time this period expires, the entire metadata cache will be flushed to disk. The action of the flush interval will generally cause this operation to do nothing.

Default value: 6

42.2. Configuration Guidelines

This section attempts to give some guidelines about how to configure MMFS and the various tradeoffs that can be made.

42.2.1. Block Size

The choice of block size is the most important configuration option. The filesystem uses the large block size to amortize access time across large data transfers. The blocks also provide high locality for the data they contain, avoiding the need to implement complex localizing allocation and access mechanisms in the filesystem. The choice of block size depends on several factors: the access time and data transfer rate of the disk, the number and rate of the streams to be sustained.

The important disk performance factors to consider are the worst-case access time and the minimum sustained transfer rate. Disk manufacturers generally quote the average access time for disks and keep the worst case figures under wraps since they are often considerably higher than the average. Access time generally consists of seek time plus settling time plus rotational delay plus command submission overhead. The worst case seek time is generally a move from one edge of the disk to the other. Worst case rotational delay occurs when the target sector has just passed the head when it reached the destination cylinder; for a 7200RPM disk, this is 8.3ms. Settling and command time tend to be constant, although if the access includes a head switch then there may be a small contribution from that. As a rule of thumb, worst case access time can be taken to be about 3 times the manufacturers quoted average access time.

The sustained transfer rate for a disk varies across the disk with the differences in recording density due to zoning. Most current disks have 10 or more zones. The best data rate comes from the outer zones, and the worst from the inner zones and may differ by several MiB/s. Transfer of data to or from the disk will also incur head change and single cylinder seek delays for large multi-sector transfers. Another factor that contributes to the transfer rate is the speed with which data can be transferred across the disk interface. This will depend on things like the DMA modes supported by the disk and the host interface, cable design, cache and MMU factors. Embedded systems often do not have the kind of high performance interfaces that are common on data-centre servers.

A standard definition TV stream uses a data rate of 4-10Mb/s. An HDTV stream can run up to 27Mb/s, although current systems only run at 14 to 17 Mb/s. These are encoded using MPEG-2, which provides a highly variable data rate depending on source and contents between 2 and 14Mb/s.

To see what effect different block sizes have on throughput, let us consider an 8.2Mb/s stream, which conveniently approximates to 1MiB/s. The disk is assumed to spin at 7200 RPM, have a worst case access time of 30ms and a worst case sustained transfer rate of 20MiB/s. If this disk is formatted with 256KiB blocks, then the time to fetch one block is 42.5 ms (30ms worst case access time plus 12.5ms worst case transfer time). One second's worth of data is four blocks, taking 170ms. If the disk is formatted with 64KiB blocks, then the time to fetch one block is 33.125ms (30ms worst case access time plus 3.125ms worst case transfer time). One second's worth of data is sixteen blocks, taking 530ms.

From this we can see that using 256KiB blocks, we have enough throughput on this disk to run five or six 1MiB/s streams, but with 64KiB blocks there is barely the capacity for running two streams. The figures used here are worst case times, and on average the disk will be able to sustain more streams and higher data rates. However, if guarantees are to be met for glitch-free recording and playback, it is necessary to calculate for the most demanding scenario where seek distance, rotational delay and stream data rate conspire to make things difficult, even if such situations are rare and transient in real life.

42.2.2. BAT Size

The size of the Block Allocation Tables determines the amount of data that can be recorded in a single file. If the disk is formatted with a 256KiB block size a single block will contain 64Ki block addresses, which, at 1MiB/s, will record 16Ki seconds of data, or about 4.5 hours. This is sufficient for most PVR applications where most recordings are 30 minutes or an hour. It even accommodates most movies and sporting events. Increasing the BAT size to two or more blocks will allow longer recordings to be made in a single file, but at the expense of wasting space in the common case. An alternative approach would be to record a single stream in multiple files at the application level.

The number of BATs is also an important factor to consider, and is linked to the directory size. This relationship will be described in the next section. However, an important factor in choosing the size and number of BATs is the time taken to format the disk and perform filesystem startup. During formatting all the BATs must be zeroed, something that can take a long time if they are large an numerous. During filesystem startup, all BATs allocated to current files are scanned to detect orphaned blocks. The time taken to do this is proportional to the size of the BATs and the number of files.

42.2.3. Directory Size

The size of the directory provides one of the limits on how many files may be stored in the filesystem. The directory occupies a whole number of blocks, and with 256KiB blocks and 256 byte directory entries, each directory block can contain 1024 entries. This may be more than enough for most purposes: on a 160GB disk this averages to about 160MB per file, or 2m40s at 1MiB/s. Another way of looking at this is that a 160GB disk can contain about 40 hours of recorded TV, or about 80 30 minute programs. In this context, 1024 entries is more than adequate.

The other limit on the number of files is the number of BATs. These are allocated dynamically to files as they are created. Running out of BATs will cause file creation to fail, even if there are directory entries free. Having more BATs that directory entries is wasteful. Even having the same number, given the calculation above, can be seen as excessive. For a 160GB disk, about 200 BATs would be a more suitable figure.

42.2.4. Cache Sizes

The filesystem contains two caches: a metadata cache for the directory, freelist and BATs; and a data cache for file contents. The number of blocks in each cache is important to the correct functioning of the filesystem. Too many blocks and the filesystem occupies too much RAM. Too few blocks and data may be evicted from the cache too soon and result in performance problems.

The size of the metadata cache depends on the free list, the number of open files and any directory searches that are being made. The free list requires two cache blocks, one for the head and one for the tail. Each open file needs a block to contain the current read or write position in the BAT and, occasionally, an extra block to handle the prefetch of the next block in the BAT. Concurrent directory searches also consume metadata cache blocks. The default size of the metadata cache is therefore set to use two blocks for the free list, plus one for each possible open file, plus four to take up the prefetches and searches.

The size of the data cache depends only on the maximum number of open files. For each file we need a buffer for each level of multi-buffering, plus two to support the read-ahead or write-behind.