This section of the documentation gives an outline description of the most important parts of the architectural HAL package, and especially how the eCos HAL specification has been mapped on to the TILE-Gx hardware.
The TILE-Gx HAL is organized somewhat differently from HALs for other targets. Typically an eCos port involves three or four separate HAL packages: the architectural HAL handles features that are common to every chip within an architecture; a variant HAL copes with different families within an architecture, for example one family may support only on-chip memory and no MMU while another family is designed for use with external memory; within a family there may be different processors supporting different sets of peripherals; finally a platform HAL handles anything specific to the circuit board rather than to the chip, for example the amount of external memory. The TILE-Gx port does not require these complications. Most of the hardware variations will be handled by the hypervisor or within the Linux partition and do not affect the eCos side of things.
The eCos port to the TILE-Gx architecture only supports 32-bit mode. int, long and pointers are all 32 bits, and a long long is 64 bit. The chip always runs in little-endian mode.
The architectural HAL package provides the standard HAL header files cyg/hal/hal_arch.h, cyg/hal/hal_intr.h, cyg/hal/hal_io.h, and so on. However there is one important difference between the TILE-Gx versions of these headers and their equivalents for other architectures. Typically hal_intr.h provides definitions of all the interrupt and exception vectors, directly or indirectly. Similarly hal_io.h provides definitions for some or all of the on-chip peripherals. These definitions are required by some parts of eCos, for example the kernel needs to know which interrupt corresponds to the system clock, but they are specific to individual processors or variants. The gcc toolchain supports the architecture as a whole so cannot supply these definitions.
For TILE-Gx the situation is different. The multicore development environment comes with a full set of definitions in various headers below the arch subdirectory, for example arch/interrupts.h defines all the interrupt vectors, and arch/spr_def.h defines the special purpose registers. tile-gcc will find these header files automatically. The headers are intended for use by the Linux kernel but are mostly usable by eCos and eCos applications. There may be some instances where these header files reference Linux-specific functionality, or where they assume a 64-bit build. Duplicating all this information in the eCos header files would serve no purpose.
Memory Layout and the Linker Script
The memory layout for a ROM startup application or for gdbstubs is as follows:
|0x00010000||Code (.text section, read-only data)|
|0x10000000||Data (.data and .bss sections, eCos heap)|
|0x6C000000||Startup stack and Hypervisor Data|
The amount of memory allocated for the ROM region is just large enough to hold the application's code and read-only data, plus enough for a shadow copy of the .data initialized static data section. That shadow copy is needed to allow for platform restarts. The hypervisor will round up this amount to a suitable page boundary.
eCos memory accesses go through the MMU so the above memory locations are translated to physical addresses. Therefore one eCos tile's location 0x10000000 will correspond to a different physical memory address from another tile's location 0x10000000, and these memory regions are not shared between tiles.
The amount of memory allocated for the RAM region is determined by the
CYGNUM_HAL_TILEGX_RAM_SIZE configuration option,
but can be overridden by passing a suitable
to tile-ecos-32to64 when the executable is
converted into a format suitable for including into a hypervisor boot
image. The first 4K are reserved for a system data
hal_tilegx_global_state which contains
information such as the virtual vector table, interrupt-related data,
and MMU settings. In a debug system this data structure must be shared
between the ROM-startup gdbstubs and the RAM-startup application,
which is most conveniently done by placing it at a well-known
In a debug system the next 60K of the RAM region, up to location 0x10010000, is reserved for use by gdbstubs. The remaining memory holds the application code, data, and heap.
The hypervisor allocates some additional memory at location 0x6c000000
(strictly, at the pertile va location specified in the hypervisor
configuration file, but that location should be 0x6c000000). The
hypervisor assumes that the application is an ordinary BME application
which will need memory for its stack and heap, and which will also
need information from the hypervisor such as the tile's CPU speed. The
eCos requirements are different but there is no easy way to return
this memory to the hypervisor. Instead eCos can still make good use of
it for the startup and interrupt stack, and it does need some
of the same information from the hypervisor. If an application wishes
to access this hypervisor information it can do so via
hal_tilegx_global_state.hv_global_info, as defined
When an interrupt occurs the cpu branches to a location determined by the current cpu protection level and the interrupt vector number. eCos always runs at protection level 2, which means that the relevant locations occupy approximately 16K starting at location 0xFFFF_FFFF_FE00_0000. eCos needs to provide these interrupt vectors so an executable contains a section for this. Strictly the location can be changed by manipulating the INTERRUPT_VECTOR_BASE_2 special purpose register but there is no good reason for doing so.
The linker script src/tilegx.ld defines all the above, in conjunction with pkgconf/hal_tilegx.h and pkgconf/mlt_tilegx.h.
For a RAM startup application running on top of gdbstubs, the
application's code will be placed at location 0x10010000 onwards,
immediately after the memory reserved
hal_tilegx_global_state and gdbstubs. The
application's static data will follow immediately after the code. The
rest of the memory will be allocated to the system heap for dynamic
memory allocation. A RAM startup application will run in the memory
map set up by the hypervisor, so the RAM size is determined by the
CYGNUM_HAL_TILEGX_RAM_SIZE option used when
gdbstubs was configured, or alternatively by the memory size passed to
tile-ecos-32to64. The RAM startup initialization
code will determine the actual amount of RAM and size the system heap
During bootstrap the hypervisor will initialize all BME tiles as per the configuration file, allocating memory as per the executable's memory map, then jumping to the executable's entry point. The hypervisor runs at protection level 2, and starts the executable with the same protection level. There is never any need for eCos to change this protection level, and doing so would introduce various complications especially in the interrupt handling code.
For ROM startup the application entry point
hal_tilegx_start in src/vectors.S.
If the application has been started by the hypervisor then that will
have already taken care of much of the low-level initialization, for
example zeroing the .bss uninitialized static data region. However it
is also possible for an application to perform a restart. This happens
most commonly when a maintenance packet r command
is issued from inside
tile-gdb and the debug
session is then terminated, but a restart can also be caused by a
double fault exception or by using
HAL_PLATFORM_RESET() macro defined in
cyg/hal/hal_intr.h. After a
restart the assembler initialization code needs to do rather more
work, including restoring all initialized static data to their
original values, zeroing all uninitialized static data, and switching
to an appropriate stack.
If eCos has been built with configuration
CYGHWR_HAL_TILEGX_SIMULATOR enabled and if
it is actually running inside the simulator then the application will
halt at this point, allowing the user to
Once the assembler initialization code has finished it jumps
to the C function
defined in src/tilegx.c. This performs
initialization or reinitialization of various other subsystems
including the memory management unit's translation lookaside buffers
(TLBs), interrupt handling, virtual vectors, and gdbstubs as
appropriate. Finally it runs through any C++ static constructors,
including those for other eCos packages like the eCos kernel, and
calls the generic
For RAM startup the application entry point is
in src/vectors.S, but the code executed is
somewhat different from that for ROM startup. Again there is a jump to
in src/tilegx.c, and from there
HAL_SavedRegisters structure defined
in cyg/hal/hal_arch.h defines
the storage needed for saving and restoring a thread context during
context switches and interrupt handling. Mostly it consists of the
registers r0-r53, but there is some additional state which overlaps
the stack frames defined by the TILE-Gx ABI. The details are generally
of no interest to application developers.
There is one piece of system state which is not held in the saved
context structure and which arguably should be: the special purpose
SPR_CMPEXCH_VALUE. This register is not
used in ordinary code. It serves only to help implement shared memory
int result; __insn_mtspr(SPR_CMPEXCH_VALUE, oldval); result = __insn_cmpexch4(&spinlock, newval);
It is possible for an interrupt to occur between
SPR_CMPEXCH_VALUE and applying
the cmpexch4 or cmpexch
instructions, and the register may get overwritten before the code
resumes. The Linux kernel saves and restores this special purpose
register during interrupt handling, adding several cycles to the
interrupt latency. The eCos HAL does not save this register on the
assumption, and instead the spinlock code has to disable interrupts
around the above pair of instructions:
CYG_INTERRUPT_STATE ints_state; int result; HAL_DISABLE_INTERRUPTS(ints_state); __insn_mtspr(SPR_CMPEXCH_VALUE, oldval); result = __insn_cmpexch4(&spinlock, newval); HAL_RESTORE_INTERRUPTS(ints_state);
This makes interrupt handling more efficient but spinlocks more expensive. Since eCos does not support SMP operations spinlocks are unlikely to be used often, and it is expected that this approach will be a net performance gain.
Interrupt management requires several pieces of functionality. First
it must be possible to disable and reenable interrupts, so that
critical code sections can run atomically. Second it must be possible
to mask and unmask individual interrupt sources. Third, if support for
nested interrupts is enabled via the configuration option
CYGSEM_HAL_COMMON_INTERRUPTS_ALLOW_NESTING then it
should be possible to assign priorities to the various interrupts,
such that inside an interrupt handler lower priority interrupts are
masked and higher priority interrupts are unmasked. Finally other eCos
code including the kernel and any device drivers must be able to
register their own interrupt handling functions. There are two
versions of such handlers: a low-level VSR must be written in
assembler, but is called very early after an interrupt triggers; a
higher-level ISR can be written in C, but the system needs to do more
work before the ISR can be called.
Each TILE-Gx tile has two special purpose registers or SPRs which control how interrupts are handled. INTERRUPT_MASK_2 can be used to mask or unmask the various interrupt sources (there are other registers for protection levels 0, 1, and 3 but those are irrelevant to the eCos port). INTERRUPT_CRITICAL_SECTION can be used to block all maskable interrupts. At first glance this second register could be used to implement the disable/reenable functionality. Unfortunately that does not quite work. If a CPU exception occurs while INTERRUPT_CRITICAL_SECTION is set then that is treated as a non-recoverable double fault. Since gdbstubs depends on CPU exceptions for some of the debug functionality, the implementation takes a different approach.
During normal execution INTERRUPT_MASK_2 holds the
set of all interrupts that are currently masked, as expected. A shadow
copy of this set is held in the
Disabling interrupts involves
setting INTERRUPT_MASK_2 to 0xFFFF_FFFF_FFFF_FFFF,
and reenabling interrupts involves
restoring INTERRUPT_MASK_2 as per the shadow copy.
The above explanation is actually oversimplified. Implementing prioritized nested interrupts requires some additional complications. Associated with each interrupt source is an interrupt mask holding the set of all interrupts with equal or lower priorities. There are also two pseudo-interrupt sources, none and disabled, with associated masks 0 and 0xFFFF_FFFF_FFFF_FFFF. At any time the value of the INTERRUPT_MASK_2 SPR is the union of the global interrupt mask and the current interrupt's mask. During normal execution the current interrupt is none so INTERRUPT_MASK_2 holds the same value as the global interrupt mask. When interrupts are disabled the current interrupt is disabled so INTERRUPT_MASK_2 holds 0xFFFF_FFFF_FFFF_FFFF. While processing an interrupt INTERRUPT_MASK_2 holds all globally masked interrupts and all interrupts masked for the current interrupt. Keeping everything up to date in the right order requires considerable care, but achieves the desired functionality.
Assuming an unmasked interrupt triggers, the hardware jumps to
location 0xFFFF_FFFF_FE00_0000 + (0x100 * interrupt_number). The ROM
startup executable or gdbstubs provides the code that resides at that
location, as per the macro
src/vectors.S. This initial code allocates space
HAL_SavedRegisters structure on the
stack , saves a small number of registers, loads a per-interrupt VSR
function pointer from
jumps to that VSR. Usually that VSR will
hal_default_interrupt_vsr, again in
src/vectors.S, but applications can install their
own VSR functions if interrupt latency is particularly critical for an
interrupt source. Any such VSR is likely to be based at least in part
on the default one.
The default VSR saves additional registers, updates the current
interrupt field in
the INTERRUPT_MASK_2 SPR, synchronizes with the
kernel, and enables nested interrupts. It then calls the ISR
associated with the current interrupt. ISRs can be written in C but
there are constraints on what they are allowed to do. More information
on this is provided in the kernel documentation. When the ISR returns
the VSR performs additional processing, possibly including a context
switch to a higher-priority thread that is now runnable, before
eventually returning to the interrupted code.
On TILE-Gx exceptions like SIGILL, an illegal instruction
exception, are implemented in much the same way as interrupts.
However exceptions cannot be masked. The CPU jumps to a location near
0xFFFF_FFFF_FE00_0000, where the ROM startup executable or gdbstubs
will have placed suitable code. That code jumps to a VSR, which this
time will usually be
hal_default_interrupt_vsr. This in
src/tilegx.c which will usually deliver the
exception to the kernel. There are various special cases, for example
a SIGILL exception may be the result of hitting
a tile-gdb breakpoint.
One of the exceptions is special: double fault. This occurs when
a CPU exception occurs while
INTERRUPT_CRITICAL_SECTION SPR is set.
Double faults are not recoverable: critical information held in other
SPRs will have been overwritten. If running in the simulator and
CYGHWR_HAL_TILEGX_SIMULATOR is set then the
simulation will be halted. Otherwise, in the absence of a better
solution, an attempt will be made to restart the system. The structure
hal_tilegx_global_state.started_by will be
set to hal_tilegx_started_by_double_fault, allowing
application code to detect this after the restart and take any action
that might be appropriate. Double faults should be rare, but
application developers should be aware of the possibility.
The Idle Thread
The kernel's idle thread will execute the nap instruction, causing the tile to sleep until the next interrupt occurs.
The System Clock
The kernel clock has been implemented using the TILE_TIMER_CONTROL special purpose register, so that hardware is not available for use by application code. The auxiliary tile timer is available for use by the application, but note that the simulator does not implement that timer.
The counter value programmed into the TILE_TIMER_CONTROL register is
determined from the system clock frequency, which is information
provided by the hypervisor, and from the configuration options
CYGNUM_HAL_RTC_DENOMINATOR. The hardware does
not support automatic reloading of the counter when a clock interrupt
occurs, so the interrupt handler has to reload it explicitly. That
code attempts to compensate for the time taken from the interrupt
triggering to the counter being reloaded. The accuracy cannot be
completely guaranteed, especially in a debug system, so a small amount
of clock drift may occur.
The TILE-Gx architecture has a complicated caching system including
per-tile primary and secondary caches and a distributed tertiary
cache. The hardware maintains data cache coherency so there is very
rarely any need for code to exercise fine-grained control over the
cache such as flushing cachelines. Therefore the various cache-related
eCos macros like
HAL_DCACHE_SYNC() are defined as
no-ops. The hardware does not maintain coherency between the
instruction and data cache so eCos does define a number of instruction
cache macros like
are needed by the gdbstubs code to implement breakpoints, and are
unlikely to be of any interest to application developers.
The port supports two destinations for diagnostic output. Applications
built for ROM startup will send their diagnostic output to the
hypervisor over the IDN bus, and the hypervisor will output the text
on the system console. For a RAM startup application diagnostic output
will normally be sent to tile-gdb via LittleBoPeep,
but the output can be redirected to the hypervisor if desired. This
behaviour is controlled by
The TILE-Gx architectural HAL provides two non-standard functions which can be used to manage the MMU settings:
#include <cyg/hal/hal_tilegx.h> int hal_tilegx_mmap(unsigned long long virtual_address, unsigned long long physical_address, unsigned long long dtlb_attributes); void hal_tilegx_munmap(unsigned long long virtual_address);
hal_tilegx_mmap() can be used to map a physical
address into the tile's virtual address space with the specified
attributes. The physical address can correspond to real memory.
Typically this will be allocated by a process running in the Linux
partition, and the details can then be forwarded to an eCos
application which will map it into its address space. Alternatively
the physical address can correspond to a memory-mapped device.
The virtual address can be anywhere in the address space that is not
already used, but preferably in the range 0x0000_0000 to 0x7FFF_FFFF
to avoid problems with 32-bit pointers. Low memory is normally used
for the system's ROM and RAM regions and 0x6C00_0000 is used for the
hypervisor data, but anywhere between 0x4000_0000 to 0x6800_0000 or
0x7000_0000 to 0x7FFF_FFFF is normally fine. The address should be
aligned to a boundary suitable for the block size. The final argument
will be written to the DTLB_CURRENT_ATTR SPR and
consists of numerous fields. The Tilera documentation should be
consulted for more information.
returns 0 if the operation fails, typically because all of the data
TLBs are already in use, or 1 on success.
hal_tilegx_munmap() can be used to undo a