Floating Point support

Floating Point support
	Chapter 285. Cortex-M Architectural Support

Overview

The Cortex-M architectural HAL provides support for a hardware Floating Point Unit (FPU) if one is present, to provide accelerated floating point math operations.

Support is currently provided for the FPU designs found on the Cortex-M4 and Cortex-M7 architectural variants. However even with these variants, the FPU is an optional feature and a more specific classification of, for example, Cortex-M4F indicates the presence of the FPU.

Furthermore, even if an FPU is present, as indicated by a platform HAL package, it is not required to be used, and the default is not to use it (therefore defaulting to software FP) so that the developer must take the step of enabling hardware FPU support if it is desired. Equally the developer is still permitted to keep using software floating point, which may simplify and reduce code size and stack use (due to the larger register contexts required for FP) in some cases. This software floating point is provided by the compiler (GCC) runtime, based on the compiler flags in use.

Configuration

As described earlier, in order to enable hardware floating point support in this HAL package you must enable the configuration option CYGHWR_HAL_CORTEXM_FPU (Use hardware FPU) which can be found within the CYGPKG_HAL_CORTEXM_FPU (Floating Point Support) CDL component.

Configuration of the FPU support is an important step as the use of the FPU not only affects code generation and requires some initialization, but also an understanding of whether multiple kernel threads in the application may be using FP operations, in which case the method of saving/restoring the FPU register bank on context switches must be set appropriately.

Compile and link flags

Both the application and eCos must be built and linked with matching compiler/linker flags appropriate to the configuration selected for FPU support. It is usually easiest to examine the CYGBLD_GLOBAL_CFLAGS configuration option, or simply the build output, to see the relevant flags in use. These are the flags to look for, and a brief summary of their purpose:

-mcpu=cortex-m4: No Cortex-M3 core supports FPU operations, so -mcpu=cortex-m4 is required to allow the correct instructions to be generated. For the moment, use of this option also applies when using the Cortex-M7 although this will likely change in a future compiler update.
-mfloat-abi=hard: This directs the compiler to generate FPU instructions for floating point operations. If this option is absent, the default of -mfloat-abi=soft, i.e. software FP is used.
-mfpu=…: This option indicates which hardware FPU is present, covering the number of registers, their sizes, and so on. For the moment, only -mfpu=fpv4-sp-d16, as used on the Cortex-M4F and M7F, is supported. This corresponds to the VFPv4 specification with 32 single precision registers, also usable as 16 double precision registers.

Threads and context switching with FP

With the hardware FPU support enabled, it is then possible to configure the CYGHWR_HAL_CORTEXM_FPU_SWITCH (FPU context switch) configuration option in order to control how FPU registers are saved/restored in context switches. There are three settings: ALL, LAZY, and NONE.

ALL

This mode is the most straightforward, and means that on every context switch, all FPU registers are saved and restored between threads.

This mode makes the most sense if you need determinism and/or most or all of your threads will use FP. However if few threads use FP, it can result in a lot of overhead due to saves and restores of unchanged registers.

Enabling the ALL mode also takes advantage of the Cortex-M lazy exception stacking feature in order to reduce interrupt/exception latency. This means that after an exception or interrupt, the core reserves space for the FPU register context on the stack, but does not actually save the FPU register contents onto the stack unless needed.

LAZY

In this mode, if a thread has not used the FPU, the FPU context will not be saved or restored for it. The HAL installs exception handlers on the Cortex-M UsageFault and HardFault exceptions in order to detect the first time the FPU is accessed by that thread. Once the FPU is accessed, the fault handler enables the FPU for that thread, and from then on, the FPU context will be saved and restored when switching from or to that thread.

In a system where some or many threads do not use the FPU, this can greatly improve context switch time. However if the system spends most of its time swapping between two or more threads which do both use the FPU, then there may be additional overhead compared to the ALL mode (due to the need to check if the FPU was enabled for a particular thread on switch). This means the worst case context switch time is longer than with ALL mode. It also reduces determinism as there is an unavoidable latency at the point the thread first accesses the FPU, so that the fault handler can execute to enable the FPU; and determinism is further affected as context switch time depends on whether threads use the FPU.

The LAZY mode does not save on stack usage, as the number of registers which might need to be saved remains the same.

Unlike the ALL mode, there is not yet support for lazy exception stacking for those threads which have the FPU enabled, which means if the FPU is enabled at the time of interrupt or exception, much of the FPU register context (the FPSCR, and half the data registers) will be saved. Please contact eCosCentric if it would be of interest to enhance eCos by adding adding lazy exception stacking to the LAZY context switching mode.

NONE

In this mode, the FPU is enabled, but no floating point context is stored at any point, which naturally means there is no overhead on context switch. However this means that only one thread or context may use the FPU at a time.

If using this mode, either all FP operations must be constrained to a single thread. Or there must be locking to ensure that multiple threads do not access the FPU registers simultaneously. But if you rely on locking, great care must be taken as the compiler has the potential to reorder floating point accesses outside of the critical region if it is still in the same function. The use of the HAL_REORDER_BARRIER() call from the <cyg/hal/hal_arch.h> HAL header can be useful to prevent reordering across a particular point in the code.

Floating point specific tests

The kernel package has a number of tests to exercise floating point operations, especially when switching threads. Some of these tests take particular account of the Cortex-M features in determining what to test. The relevant test names in the kernel package all have the prefix "fp".

FP in exception contexts

Floating point must not be used in an ISR or from kernel exception handlers. If used, FPU registers will not be restored correctly on the return from the ISR/exception.

However if the ALL context switch mode is in use, it is permitted to use floating point in DSR routines, including kernel alarm functions. They may also be used in NONE mode, but as expected, this could only be if no threads are using floating point; this can be ensured in threads by using the kernel scheduler lock to prevent DSRs from running temporarily, although clearly that has an impact on real-time behaviour. As mentioned earlier, it would also be advised to combine the lock with use of HAL_REORDER_BARRIER().

Do not use floating point operations in DSRs when LAZY context switch mode is used. There is no guarantee of which thread context will be current when the DSR is run, meaning that if the interrupt occured while a thread that does not use FP was running, the DSR would cause the FPU to enabled for that thread from then on.

Name