Name

tm_basic — Measure the performance of the eCos kernel

Description

When building a real-time system, care must be taken to ensure that the system will be able to perform properly within the constraints of that system. One of these constraints may be how fast certain operations can be performed. Another might be how deterministic the overall behavior of the system is. Lastly the memory footprint (size) and unit cost may be important.

One of the major problems encountered while evaluating a system will be how to compare it with possible alternatives. Most manufacturers of real-time systems publish performance numbers, ostensibly so that users can compare the different offerings. However, what these numbers mean and how they were gathered is often not clear. The values are typically measured on a particular piece of hardware, so in order to truly compare, one must obtain measurements for exactly the same set of hardware that were gathered in a similar fashion.

Two major items need to be present in any given set of measurements. First, the raw values for the various operations; these are typically quite easy to measure and will be available for most systems. Second, the determinacy of the numbers; in other words how much the value might change depending on other factors within the system. This value is affected by a number of factors: how long interrupts might be masked, whether or not the function can be interrupted, even very hardware-specific effects such as cache locality and pipeline usage. It is very difficult to measure the determinacy of any given operation, but that determinacy is fundamentally important to proper overall characterization of a system.

In the discussion and numbers that follow, three key measurements are provided. The first measurement is an estimate of the interrupt latency: this is the length of time from when a hardware interrupt occurs until its Interrupt Service Routine (ISR) is called. The second measurement is an estimate of overall interrupt overhead: this is the length of time average interrupt processing takes, as measured by the real-time clock interrupt (other interrupt sources will certainly take a different amount of time, but this data cannot be easily gathered). The third measurement consists of the timings for the various kernel primitives.

Methodology

Key operations in the kernel were measured by using a simple test program which exercises the various kernel primitive operations. A hardware timer, normally the one used to drive the real-time clock, was used for these measurements. In most cases this timer can be read with quite high resolution, typically in the range of a few microseconds. For each measurement, the operation was repeated a number of times. Time stamps were obtained directly before and after the operation was performed. The data gathered for the entire set of operations was then analyzed, generating average (mean), maximum and minimum values. The sample variance (a measure of how close most samples are to the mean) was also calculated. The cost of obtaining the real-time clock timer values was also measured, and was subtracted from all other times.

Most kernel functions can be measured separately. In each case, a reasonable number of iterations are performed. Where the test case involves a kernel object, for example creating a task, each iteration is performed on a different object. There is also a set of tests which measures the interactions between multiple tasks and certain kernel primitives. Most functions are tested in such a way as to determine the variations introduced by varying numbers of objects in the system. For example, the mailbox tests measure the cost of a 'peek' operation when the mailbox is empty, has a single item, and has multiple items present. In this way, any effects of the state of the object or how many items it contains can be determined.

There are a few things to consider about these measurements. Firstly, they are quite micro in scale and only measure the operation in question. These measurements do not adequately describe how the timings would be perturbed in a real system with multiple interrupting sources. Secondly, the possible aberration incurred by the real-time clock (system heartbeat tick) is explicitly avoided. Virtually all kernel functions have been designed to be interruptible. Thus the times presented are typical, but best case, since any particular function may be interrupted by the clock tick processing. This number is explicitly calculated so that the value may be included in any deadline calculations required by the end user. Lastly, the reported measurements were obtained from a system built with all options at their default values. Kernel instrumentation and asserts are also disabled for these measurements. Any number of configuration options can change the measured results, sometimes quite dramatically. For example, mutexes are using priority inheritance in these measurements. The numbers will change if the system is built with priority inheritance on mutex variables turned off.

The final value that is measured is an estimate of interrupt latency. This particular value is not explicitly calculated in the test program used, but rather by instrumenting the kernel itself. The raw number of timer ticks that elapse between the time the timer generates an interrupt and the start of the timer ISR is kept in the kernel. These values are printed by the test program after all other operations have been tested. Thus this should be a reasonable estimate of the interrupt latency over time.

Using these Measurements

These measurements can be used in a number of ways. The most typical use will be to compare different real-time kernel offerings on similar hardware, another will be to estimate the cost of implementing a task using eCos (applications can be examined to see what effect the kernel operations will have on the total execution time). Another use would be to observe how the tuning of the kernel affects overall operation.

Influences on Performance

A number of factors can affect real-time performance in a system. One of the most common factors, yet most difficult to characterize, is the effect of device drivers and interrupts on system timings. Different device drivers will have differing requirements as to how long interrupts are suppressed, for example. The eCos system has been designed with this in mind, by separating the management of interrupts (ISR handlers) and the processing required by the interrupt (DSR—Deferred Service Routine— handlers). However, since there is so much variability here, and indeed most device drivers will come from the end users themselves, these effects cannot be reliably measured. Attempts have been made to measure the overhead of the single interrupt that eCos relies on, the real-time clock timer. This should give you a reasonable idea of the cost of executing interrupt handling for devices.

Measured Items

This section describes the various tests and the numbers presented. All tests use the C kernel API (available by way of cyg/kernel/kapi.h). There is a single main thread in the system that performs the various tests. Additional threads may be created as part of the testing, but these are short lived and are destroyed between tests unless otherwise noted. The terminology “lower priority” means a priority that is less important, not necessarily lower in numerical value. A higher priority thread will run in preference to a lower priority thread even though the priority value of the higher priority thread may be numerically less than that of the lower priority thread.

Thread Primitives

Create thread
This test measures the cyg_thread_create() call. Each call creates a totally new thread. The set of threads created by this test will be reused in the subsequent thread primitive tests.
Yield thread
This test measures the cyg_thread_yield() call. For this test, there are no other runnable threads, thus the test should just measure the overhead of trying to give up the CPU.
Suspend [suspended] thread
This test measures the cyg_thread_suspend() call. A thread may be suspended multiple times; each thread is already suspended from its initial creation, and is suspended again.
Resume thread
This test measures the cyg_thread_resume() call. All of the threads have a suspend count of 2, thus this call does not make them runnable. This test just measures the overhead of resuming a thread.
Set priority
This test measures the cyg_thread_set_priority() call. Each thread, currently suspended, has its priority set to a new value.
Get priority
This test measures the cyg_thread_get_priority() call.
Kill [suspended] thread
This test measures the cyg_thread_kill() call. Each thread in the set is killed. All threads are known to be suspended before being killed.
Yield [no other] thread
This test measures the cyg_thread_yield() call again. This is to demonstrate that the cyg_thread_yield() call has a fixed overhead, regardless of whether there are other threads in the system.
Resume [suspended low priority] thread
This test measures the cyg_thread_resume() call again. In this case, the thread being resumed is lower priority than the main thread, thus it will simply become ready to run but not be granted the CPU. This test measures the cost of making a thread ready to run.
Resume [runnable low priority] thread
This test measures the cyg_thread_resume() call again. In this case, the thread being resumed is lower priority than the main thread and has already been made runnable, so in fact the resume call has no effect.
Suspend [runnable] thread
This test measures the cyg_thread_suspend() call again. In this case, each thread has already been made runnable (by previous tests).
Yield [only low priority] thread
This test measures the cyg_thread_yield() call. In this case, there are many other runnable threads, but they are all lower priority than the main thread, thus no thread switches will take place.
Suspend [runnable->not runnable] thread
This test measures the cyg_thread_suspend() call again. The thread being suspended will become non-runnable by this action.
Kill [runnable] thread
This test measures the cyg_thread_kill() call again. In this case, the thread being killed is currently runnable, but lower priority than the main thread.
Resume [high priority] thread
This test measures the cyg_thread_resume() call. The thread being resumed is higher priority than the main thread, thus a thread switch will take place on each call. In fact there will be two thread switches; one to the new higher priority thread and a second back to the test thread. The test thread exits immediately.
Thread switch
This test attempts to measure the cost of switching from one thread to another. Two equal priority threads are started and they will each yield to the other for a number of iterations. A time stamp is gathered in one thread before the cyg_thread_yield() call and after the call in the other thread.

Scheduler Primitives

Scheduler lock
This test measures the cyg_scheduler_lock() call.
Scheduler unlock [0 threads]
This test measures the cyg_scheduler_unlock() call. There are no other threads in the system and the unlock happens immediately after a lock so there will be no pending DSR‚s to run.
Scheduler unlock [1 suspended thread]
This test measures the cyg_scheduler_unlock() call. There is one other thread in the system which is currently suspended.
Scheduler unlock [many suspended threads]
This test measures the cyg_scheduler_unlock() call. There are many other threads in the system which are currently suspended. The purpose of this test is to determine the cost of having additional threads in the system when the scheduler is activated by way of cyg_scheduler_unlock().
Scheduler unlock [many low priority threads]
This test measures the cyg_scheduler_unlock() call. There are many other threads in the system which are runnable but are lower priority than the main thread. The purpose of this test is to determine the cost of having additional threads in the system when the scheduler is activated by way of cyg_scheduler_unlock().

Mutex Primitives

Init mutex
This test measures the cyg_mutex_init() call. A number of separate mutex variables are created. The purpose of this test is to measure the cost of creating a new mutex and introducing it to the system.
Lock [unlocked] mutex
This test measures the cyg_mutex_lock() call. The purpose of this test is to measure the cost of locking a mutex which is currently unlocked. There are no other threads executing in the system while this test runs.
Unlock [locked] mutex
This test measures the cyg_mutex_unlock() call. The purpose of this test is to measure the cost of unlocking a mutex which is currently locked. There are no other threads executing in the system while this test runs.
Trylock [unlocked] mutex
This test measures the cyg_mutex_trylock() call. The purpose of this test is to measure the cost of locking a mutex which is currently unlocked. There are no other threads executing in the system while this test runs.
Trylock [locked] mutex
This test measures the cyg_mutex_trylock() call. The purpose of this test is to measure the cost of locking a mutex which is currently locked. There are no other threads executing in the system while this test runs.
Destroy mutex
This test measures the cyg_mutex_destroy() call. The purpose of this test is to measure the cost of deleting a mutex from the system. There are no other threads executing in the system while this test runs.
Unlock/Lock mutex
This test attempts to measure the cost of unlocking a mutex for which there is another higher priority thread waiting. When the mutex is unlocked, the higher priority waiting thread will immediately take the lock. The time from when the unlock is issued until after the lock succeeds in the second thread is measured, thus giving the round-trip or circuit time for this type of synchronizer.

Mailbox Primitives

Create mbox
This test measures the cyg_mbox_create() call. A number of separate mailboxes is created. The purpose of this test is to measure the cost of creating a new mailbox and introducing it to the system.
Peek [empty] mbox
This test measures the cyg_mbox_peek() call. An attempt is made to peek the value in each mailbox, which is currently empty. The purpose of this test is to measure the cost of checking a mailbox for a value without blocking.
Put [first] mbox
This test measures the cyg_mbox_put() call. One item is added to a currently empty mailbox. The purpose of this test is to measure the cost of adding an item to a mailbox. There are no other threads currently waiting for mailbox items to arrive.
Peek [1 msg] mbox
This test measures the cyg_mbox_peek() call. An attempt is made to peek the value in each mailbox, which contains a single item. The purpose of this test is to measure the cost of checking a mailbox which has data to deliver.
Put [second] mbox
This test measures the cyg_mbox_put() call. A second item is added to a mailbox. The purpose of this test is to measure the cost of adding an additional item to a mailbox. There are no other threads currently waiting for mailbox items to arrive.
Peek [2 msgs] mbox
This test measures the cyg_mbox_peek() call. An attempt is made to peek the value in each mailbox, which contains two items. The purpose of this test is to measure the cost of checking a mailbox which has data to deliver.
Get [first] mbox
This test measures the cyg_mbox_get() call. The first item is removed from a mailbox that currently contains two items. The purpose of this test is to measure the cost of obtaining an item from a mailbox without blocking.
Get [second] mbox
This test measures the cyg_mbox_get() call. The last item is removed from a mailbox that currently contains one item. The purpose of this test is to measure the cost of obtaining an item from a mailbox without blocking.
Tryput [first] mbox
This test measures the cyg_mbox_tryput() call. A single item is added to a currently empty mailbox. The purpose of this test is to measure the cost of adding an item to a mailbox.
Peek item [non-empty] mbox
This test measures the cyg_mbox_peek_item() call. A single item is fetched from a mailbox that contains a single item. The purpose of this test is to measure the cost of obtaining an item without disturbing the mailbox.
Tryget [non-empty] mbox
This test measures the cyg_mbox_tryget() call. A single item is removed from a mailbox that contains exactly one item. The purpose of this test is to measure the cost of obtaining one item from a non-empty mailbox.
Peek item [empty] mbox
This test measures the cyg_mbox_peek_item() call. An attempt is made to fetch an item from a mailbox that is empty. The purpose of this test is to measure the cost of trying to obtain an item when the mailbox is empty.
Tryget [empty] mbox
This test measures the cyg_mbox_tryget() call. An attempt is made to fetch an item from a mailbox that is empty. The purpose of this test is to measure the cost of trying to obtain an item when the mailbox is empty.
Waiting to get mbox
This test measures the cyg_mbox_waiting_to_get() call. The purpose of this test is to measure the cost of determining how many threads are waiting to obtain a message from this mailbox.
Waiting to put mbox
This test measures the cyg_mbox_waiting_to_put() call. The purpose of this test is to measure the cost of determining how many threads are waiting to put a message into this mailbox.
Delete mbox
This test measures the cyg_mbox_delete() call. The purpose of this test is to measure the cost of destroying a mailbox and removing it from the system.
Put/Get mbox
In this round-trip test, one thread is sending data to a mailbox that is being consumed by another thread. The time from when the data is put into the mailbox until it has been delivered to the waiting thread is measured. Note that this time will contain a thread switch.

Semaphore Primitives

Init semaphore
This test measures the cyg_semaphore_init() call. A number of separate semaphore objects are created and introduced to the system. The purpose of this test is to measure the cost of creating a new semaphore.
Post [0] semaphore
This test measures the cyg_semaphore_post() call. Each semaphore currently has a value of 0 and there are no other threads in the system. The purpose of this test is to measure the overhead cost of posting to a semaphore. This cost will differ if there is a thread waiting for the semaphore.
Wait [1] semaphore
This test measures the cyg_semaphore_wait() call. The semaphore has a current value of 1 so the call is non-blocking. The purpose of the test is to measure the overhead of “taking” a semaphore.
Trywait [0] semaphore
This test measures the cyg_semaphore_trywait() call. The semaphore has a value of 0 when the call is made. The purpose of this test is to measure the cost of seeing if a semaphore can be “taken” without blocking. In this case, the answer would be no.
Trywait [1] semaphore
This test measures the cyg_semaphore_trywait() call. The semaphore has a value of 1 when the call is made. The purpose of this test is to measure the cost of seeing if a semaphore can be “taken” without blocking. In this case, the answer would be yes.
Peek semaphore
This test measures the cyg_semaphore_peek() call. The purpose of this test is to measure the cost of obtaining the current semaphore count value.
Destroy semaphore
This test measures the cyg_semaphore_destroy() call. The purpose of this test is to measure the cost of deleting a semaphore from the system.
Post/Wait semaphore
In this round-trip test, two threads are passing control back and forth by using a semaphore. The time from when one thread calls cyg_semaphore_post() until the other thread completes its cyg_semaphore_wait() is measured. Note that each iteration of this test will involve a thread switch.

Counters

Create counter
This test measures the cyg_counter_create() call. A number of separate counters are created. The purpose of this test is to measure the cost of creating a new counter and introducing it to the system.
Get counter value
This test measures the cyg_counter_current_value() call. The current value of each counter is obtained.
Set counter value
This test measures the cyg_counter_set_value() call. Each counter is set to a new value.
Tick counter
This test measures the cyg_counter_tick() call. Each counter is “ticked” once.
Delete counter
This test measures the cyg_counter_delete() call. Each counter is deleted from the system. The purpose of this test is to measure the cost of deleting a counter object.

Alarms

Create alarm
This test measures the cyg_alarm_create() call. A number of separate alarms are created, all attached to the same counter object. The purpose of this test is to measure the cost of creating a new counter and introducing it to the system.
Initialize alarm
This test measures the cyg_alarm_initialize() call. Each alarm is initialized to a small value.
Disable alarm
This test measures the cyg_alarm_disable() call. Each alarm is explicitly disabled.
Enable alarm
This test measures the cyg_alarm_enable() call. Each alarm is explicitly enabled.
Delete alarm
This test measures the cyg_alarm_delete() call. Each alarm is destroyed. The purpose of this test is to measure the cost of deleting an alarm and removing it from the system.
Tick counter [1 alarm]
This test measures the cyg_counter_tick() call. A counter is created that has a single alarm attached to it. The purpose of this test is to measure the cost of “ticking” a counter when it has a single attached alarm. In this test, the alarm is not activated (fired).
Tick counter [many alarms]
This test measures the cyg_counter_tick() call. A counter is created that has multiple alarms attached to it. The purpose of this test is to measure the cost of “ticking” a counter when it has many attached alarms. In this test, the alarms are not activated (fired).
Tick & fire counter [1 alarm]
This test measures the cyg_counter_tick() call. A counter is created that has a single alarm attached to it. The purpose of this test is to measure the cost of “ticking” a counter when it has a single attached alarm. In this test, the alarm is activated (fired). Thus the measured time will include the overhead of calling the alarm callback function.
Tick & fire counter [many alarms]
This test measures the cyg_counter_tick() call. A counter is created that has multiple alarms attached to it. The purpose of this test is to measure the cost of “ticking” a counter when it has many attached alarms. In this test, the alarms are activated (fired). Thus the measured time will include the overhead of calling the alarm callback function.
Alarm latency [0 threads]
This test attempts to measure the latency in calling an alarm callback function. The time from the clock interrupt until the alarm function is called is measured. In this test, there are no threads that can be run, other than the system idle thread, when the clock interrupt occurs (all threads are suspended).
Alarm latency [2 threads]
This test attempts to measure the latency in calling an alarm callback function. The time from the clock interrupt until the alarm function is called is measured. In this test, there are exactly two threads which are running when the clock interrupt occurs. They are simply passing back and forth by way of the cyg_thread_yield() call. The purpose of this test is to measure the variations in the latency when there are executing threads.
Alarm latency [many threads]
This test attempts to measure the latency in calling an alarm callback function. The time from the clock interrupt until the alarm function is called is measured. In this test, there are a number of threads which are running when the clock interrupt occurs. They are simply passing back and forth by way of the cyg_thread_yield() call. The purpose of this test is to measure the variations in the latency when there are many executing threads.