To allow a reasonable implementation of SMP, and to reduce the
disruption to the existing source base, a number of assumptions have
been made about the features of the target hardware.
Modest multiprocessing. The typical number of CPUs supported is two
to four, with an upper limit around eight. While there are no
inherent limits in the code, hardware and algorithmic limitations
will probably become significant beyond this point.
SMP synchronization support. The hardware must supply a mechanism to
allow software on two CPUs to synchronize. This is normally provided
as part of the instruction set in the form of test-and-set,
compare-and-swap or load-link/store-conditional instructions. An
alternative approach is the provision of hardware semaphore
registers which can be used to serialize implementations of these
operations. Whatever hardware facilities are available, they are
used in eCos to implement spinlocks.
Coherent caches. It is assumed that no extra effort will be required
to access shared memory from any processor. This means that either
there are no caches, they are shared by all processors, or are
maintained in a coherent state by the hardware. It would be too
disruptive to the eCos sources if every memory access had to be
bracketed by cache load/flush operations. Any hardware that requires
this is not supported.
Uniform addressing. It is assumed that all memory that is
shared between CPUs is addressed at the same location from all
CPUs. Like non-coherent caches, dealing with CPU-specific address
translation is considered too disruptive to the eCos source
base. This does not, however, preclude systems with non-uniform
access costs for different CPUs.
Uniform device addressing. As with access to memory, it is assumed
that all devices are equally accessible to all CPUs. Since device
access is often made from thread contexts, it is not possible to
restrict access to device control registers to certain CPUs, since
there is currently no support for binding or migrating threads to CPUs.
Interrupt routing. The target hardware must have an interrupt
controller that can route interrupts to specific CPUs. It is
acceptable for all interrupts to be delivered to just one CPU, or
for some interrupts to be bound to specific CPUs, or for some
interrupts to be local to each CPU. At present dynamic routing,
where a different CPU may be chosen each time an interrupt is
delivered, is not supported. ECos cannot support hardware where all
interrupts are delivered to all CPUs simultaneously with the
expectation that software will resolve any conflicts.
Inter-CPU interrupts. A mechanism to allow one CPU to interrupt
another is needed. This is necessary so that events on one CPU can
cause rescheduling on other CPUs.
CPU Identifiers. Code running on a CPU must be able to determine
which CPU it is running on. The CPU Id is usually provided either in
a CPU status register, or in a register associated with the
inter-CPU interrupt delivery subsystem. ECos expects CPU Ids to be
small positive integers, although alternative representations, such
as bitmaps, can be converted relatively easily. Complex mechanisms
for getting the CPU Id cannot be supported. Getting the CPU Id must
be a cheap operation, since it is done often, and in performance
critical places such as interrupt handlers and the scheduler.
SMP support in any platform depends on the HAL supplying the
appropriate operations. All HAL SMP support is defined in the
cyg/hal/hal_smp.h header. Variant and platform
specific definitions will be in cyg/hal/var_smp.h
and cyg/hal/plf_smp.h respectively. These files
are include automatically by this header, so need not be included
explicitly.
SMP support falls into a number of functional groups.
This group consists of descriptive and control macros for managing the
CPUs in an SMP system.
HAL_SMP_CPU_TYPE
A type that can contain a CPU id. A CPU id is
usually a small integer that is used to index
arrays of variables that are managed on an
per-CPU basis.
HAL_SMP_CPU_MAX
The maximum number of CPUs that can be
supported. This is used to provide the size of
any arrays that have an element per CPU.
HAL_SMP_CPU_COUNT()
Returns the number of CPUs currently
operational. This may differ from
HAL_SMP_CPU_MAX depending on the runtime
environment.
HAL_SMP_CPU_THIS()
Returns the CPU id of the current CPU.
HAL_SMP_CPU_NONE
A value that does not match any real CPU
id. This is uses where a CPU type variable
must be set to a null value.
HAL_SMP_CPU_START( cpu )
Starts the given CPU executing at a defined
HAL entry point. After performing any HAL
level initialization, the CPU calls up into
the kernel at cyg_kernel_cpu_startup().
HAL_SMP_CPU_RESCHEDULE_INTERRUPT( cpu, wait )
Sends the CPU a reschedule interrupt, and if
wait is non-zero, waits for an
acknowledgment. The interrupted CPU should call
cyg_scheduler_set_need_reschedule() in its DSR to
cause the reschedule to occur.
HAL_SMP_CPU_TIMESLICE_INTERRUPT( cpu, wait )
Sends the CPU a timeslice interrupt, and if
wait is non-zero, waits for an
acknowledgment. The interrupted CPU should call
cyg_scheduler_timeslice_cpu() to cause the
timeslice event to be processed.
Test-and-set is the foundation of the SMP synchronization
mechanisms.
HAL_TAS_TYPE
The type for all test-and-set variables. The
test-and-set macros only support operations on
a single bit (usually the least significant
bit) of this location. This allows for maximum
flexibility in the implementation.
HAL_TAS_SET( tas, oldb )
Performs a test and set operation on the
location tas. oldb will contain true if
the location was already set, and false if
it was clear.
HAL_TAS_CLEAR( tas, oldb )
Performs a test and clear operation on the
location tas. oldb will contain true if
the location was already set, and false if
it was clear.
Spinlocks provide inter-CPU locking. Normally they will be implemented
on top of the test-and-set mechanism above, but may also be
implemented by other means if, for example, the hardware has more
direct support for spinlocks.
HAL_SPINLOCK_TYPE
The type for all spinlock variables.
HAL_SPINLOCK_INIT_CLEAR
A value that may be assigned to a spinlock
variable to initialize it to clear.
HAL_SPINLOCK_INIT_SET
A value that may be assigned to a spinlock
variable to initialize it to set.
HAL_SPINLOCK_SPIN( lock )
The caller spins in a busy loop waiting for
the lock to become clear. It then sets it and
continues. This is all handled atomically, so
that there are no race conditions between CPUs.
HAL_SPINLOCK_CLEAR( lock )
The caller clears the lock. One of any waiting
spinners will then be able to proceed.
HAL_SPINLOCK_TRY( lock, val )
Attempts to set the lock. The value put in
val will be true if the lock was
claimed successfully, and false if it was
not.
HAL_SPINLOCK_TEST( lock, val )
Tests the current value of the lock. The value
put in val will be true if the lock is
claimed and false of it is clear.
The scheduler lock is the main protection for all kernel data
structures. By default the kernel implements the scheduler lock itself
using a spinlock. However, if spinlocks cannot be supported by the
hardware, or there is a more efficient implementation available, the
HAL may provide macros to implement the scheduler lock.
HAL_SMP_SCHEDLOCK_DATA_TYPE
A data type, possibly a structure, that
contains any data items needed by the
scheduler lock implementation. A variable of
this type will be instantiated as a static
member of the Cyg_Scheduler_SchedLock class
and passed to all the following macros.
HAL_SMP_SCHEDLOCK_INIT( lock, data )
Initialize the scheduler lock. The lock
argument is the scheduler lock counter and the
data argument is a variable of
HAL_SMP_SCHEDLOCK_DATA_TYPE type.
HAL_SMP_SCHEDLOCK_INC( lock, data )
Increment the scheduler lock. The first
increment of the lock from zero to one for any
CPU may cause it to wait until the lock is
zeroed by another CPU. Subsequent increments
should be less expensive since this CPU
already holds the lock.
HAL_SMP_SCHEDLOCK_ZERO( lock, data )
Zero the scheduler lock. This operation will
also clear the lock so that other CPUs may
claim it.
HAL_SMP_SCHEDLOCK_SET( lock, data, new )
Set the lock to a different value, in
new. This is only called when the lock is
already known to be owned by the current CPU. It is never called to
zero the lock, or to increment it from zero.
The routing of interrupts to different CPUs is supported by two new
interfaces in hal_intr.h.
Once an interrupt has been routed to a new CPU, the existing vector
masking and configuration operations should take account of the CPU
routing. For example, if the operation is not invoked on the
destination CPU itself, then the HAL may need to arrange to transfer
the operation to the destination CPU for correct application.
HAL_INTERRUPT_SET_CPU( vector, cpu )
Route the interrupt for the given vector to
the given cpu.
HAL_INTERRUPT_GET_CPU( vector, cpu )
Set cpu to the id of the CPU to which this
vector is routed.