This section attempts to give some guidelines about how to configure MMFS and the various tradeoffs that can be made.
The choice of block size is the most important configuration option. The filesystem uses the large block size to amortize access time across large data transfers. The blocks also provide high locality for the data they contain, avoiding the need to implement complex localizing allocation and access mechanisms in the filesystem. The choice of block size depends on several factors: the access time and data transfer rate of the disk, the number and rate of the streams to be sustained.
The important disk performance factors to consider are the worst-case access time and the minimum sustained transfer rate. Disk manufacturers generally quote the average access time for disks and keep the worst case figures under wraps since they are often considerably higher than the average. Access time generally consists of seek time plus settling time plus rotational delay plus command submission overhead. The worst case seek time is generally a move from one edge of the disk to the other. Worst case rotational delay occurs when the target sector has just passed the head when it reached the destination cylinder; for a 7200RPM disk, this is 8.3ms. Settling and command time tend to be constant, although if the access includes a head switch then there may be a small contribution from that. As a rule of thumb, worst case access time can be taken to be about 3 times the manufacturers quoted average access time.
The sustained transfer rate for a disk varies across the disk with the differences in recording density due to zoning. Most current disks have 10 or more zones. The best data rate comes from the outer zones, and the worst from the inner zones and may differ by several MiB/s. Transfer of data to or from the disk will also incur head change and single cylinder seek delays for large multi-sector transfers. Another factor that contributes to the transfer rate is the speed with which data can be transferred across the disk interface. This will depend on things like the DMA modes supported by the disk and the host interface, cable design, cache and MMU factors. Embedded systems often do not have the kind of high performance interfaces that are common on data-centre servers.
A standard definition TV stream uses a data rate of 4-10Mb/s. An HDTV stream can run up to 27Mb/s, although current systems only run at 14 to 17 Mb/s. These are encoded using MPEG-2, which provides a highly variable data rate depending on source and contents between 2 and 14Mb/s.
To see what effect different block sizes have on throughput, let us consider an 8.2Mb/s stream, which conveniently approximates to 1MiB/s. The disk is assumed to spin at 7200 RPM, have a worst case access time of 30ms and a worst case sustained transfer rate of 20MiB/s. If this disk is formatted with 256KiB blocks, then the time to fetch one block is 42.5 ms (30ms worst case access time plus 12.5ms worst case transfer time). One second's worth of data is four blocks, taking 170ms. If the disk is formatted with 64KiB blocks, then the time to fetch one block is 33.125ms (30ms worst case access time plus 3.125ms worst case transfer time). One second's worth of data is sixteen blocks, taking 530ms.
From this we can see that using 256KiB blocks, we have enough throughput on this disk to run five or six 1MiB/s streams, but with 64KiB blocks there is barely the capacity for running two streams. The figures used here are worst case times, and on average the disk will be able to sustain more streams and higher data rates. However, if guarantees are to be met for glitch-free recording and playback, it is necessary to calculate for the most demanding scenario where seek distance, rotational delay and stream data rate conspire to make things difficult, even if such situations are rare and transient in real life.
The size of the Block Allocation Tables determines the amount of data that can be recorded in a single file. If the disk is formatted with a 256KiB block size a single block will contain 64Ki block addresses, which, at 1MiB/s, will record 16Ki seconds of data, or about 4.5 hours. This is sufficient for most PVR applications where most recordings are 30 minutes or an hour. It even accommodates most movies and sporting events. Increasing the BAT size to two or more blocks will allow longer recordings to be made in a single file, but at the expense of wasting space in the common case. An alternative approach would be to record a single stream in multiple files at the application level.
The number of BATs is also an important factor to consider, and is linked to the directory size. This relationship will be described in the next section. However, an important factor in choosing the size and number of BATs is the time taken to format the disk and perform filesystem startup. During formatting all the BATs must be zeroed, something that can take a long time if they are large an numerous. During filesystem startup, all BATs allocated to current files are scanned to detect orphaned blocks. The time taken to do this is proportional to the size of the BATs and the number of files.
The size of the directory provides one of the limits on how many files may be stored in the filesystem. The directory occupies a whole number of blocks, and with 256KiB blocks and 256 byte directory entries, each directory block can contain 1024 entries. This may be more than enough for most purposes: on a 160GB disk this averages to about 160MB per file, or 2m40s at 1MiB/s. Another way of looking at this is that a 160GB disk can contain about 40 hours of recorded TV, or about 80 30 minute programs. In this context, 1024 entries is more than adequate.
The other limit on the number of files is the number of BATs. These are allocated dynamically to files as they are created. Running out of BATs will cause file creation to fail, even if there are directory entries free. Having more BATs that directory entries is wasteful. Even having the same number, given the calculation above, can be seen as excessive. For a 160GB disk, about 200 BATs would be a more suitable figure.
The filesystem contains two caches: a metadata cache for the directory, freelist and BATs; and a data cache for file contents. The number of blocks in each cache is important to the correct functioning of the filesystem. Too many blocks and the filesystem occupies too much RAM. Too few blocks and data may be evicted from the cache too soon and result in performance problems.
The size of the metadata cache depends on the free list, the number of open files and any directory searches that are being made. The free list requires two cache blocks, one for the head and one for the tail. Each open file needs a block to contain the current read or write position in the BAT and, occasionally, an extra block to handle the prefetch of the next block in the BAT. Concurrent directory searches also consume metadata cache blocks. The default size of the metadata cache is therefore set to use two blocks for the free list, plus one for each possible open file, plus four to take up the prefetches and searches.
The size of the data cache depends only on the maximum number of open files. For each file we need a buffer for each level of multi-buffering, plus two to support the read-ahead or write-behind.