There are many changes in configuration that can affect performance. For example, the number and size of buffers, how checksum calculations are implementated, etc.
CYGDBG_LWIP_STATS option can be enabled to allow for a
variety of statistics counts to be gathered during execution. The various
options are all prefixed with
CYGDBG_LWIP_STATS_, and a
sub-system specific suffix.
These statistics can help with the tuning of the lwIP world during development, since monitoring the minimum and maximum usage counts of resources along with the error counts can indicate resource starvation issues. Note: Some error counts are indicative of a temporary inability to claim a resource, and are not necessarily a fatal error for the stack, just a potential slowdown.
In order to determine the number of resources used in practice, during development it is recommended that testing is performed under the expected maximum load expected to need to be handled, in order to understand the resource requirements at that load. To get useful information for this, temporarily configure lwIP with a higher number of resources than would be expected to be needed, memory permitting. Then the application should be tested under the expected network load, at the end of which, the statistics can be inspected, and attention paid to the "max" fields which show the maximum number of each resource used in practice in that sample scenario. This can then be used to inform decisions into the appropriate allocation of reduced resources set in the configuration of lwIP for the final product, without unduly compromising performance.
CYGDBG_LWIP_STATS is enabled then the function:
LWIP_PLATFORM_DIAGfunction wrapper (currently defined to use
diag_printf()in the eCos specific arch/cc.h header file).
See the Section called Memory Footprint for more information about tuning the lwIP memory footprint.
CYGPKG_LWIP_TCP option is configured then various
TCP specific options are available for tuning the performance. The main options
are covered in the subsections below.
CYGNUM_LWIP_TCP_WND option defines the maximum TCP
receive window size. This size is advertised to remote peers to indicate how
much data they can send. While larger values are faster, you should not
advertise more than you can receive, which means you must have sufficient
capacity in the pbuf pool used for received data for all
CYGNUM_LWIP_TCP_MSS option defines the Maximum Segment
Size (MSS) advertised to peers to constrain the amount of TCP data they send in
each packet. This is recommended not to be more than the interface MTU less 40
bytes. The 40 bytes are the sum of a TCP header and IP header, neither with any
options. If any options are used regularly, this value should be reduced
If the MSS has been set too large, it will result in IP fragmentation and
consequent inefficient network operation. If the MSS is too large and IP
fragmentation has been disabled (
incorrect stack operation will likely result including oversize packets never
getting sent, or even a failure in the ethernet driver. The most common MTU size
is 1500 bytes (leading to a recommended MSS of up to 1460 bytes) but is
certainly not universal: some routers, and especially VPNs, can have lower MTUs
and will in turn fragment packets leading to lower efficiency. For best resource
utilisation by lwIP, it is a good idea for the MSS to be set so that incoming
packets can fit into a whole number of pbufs from the packet buffer pool. As
such the default MSS is that of the pbuf pool packet buffer size
CYGNUM_LWIP_PBUF_POOL_BUFSIZE), less 40 bytes to allow
room for TCP and IP headers without options.
CYGNUM_LWIP_TCP_SND_BUF option defines the amount of
buffer space in bytes allowed for outstanding (unacked) sent data for each TCP
connection. This option is complementary
CYGNUM_LWIP_TCP_SND_QUEUELEN which defines the number of
packet buffers allowed for outstanding (unacked) sent data for each TCP
connection. The TCP layer will refuse to queue a buffer to be sent if either the
total quantity of data in bytes waiting to be sent would then
CYGNUM_LWIP_TCP_SND_BUF, or there are already at
CYGNUM_LWIP_TCP_SND_QUEUELEN buffers in the queue
waiting to be sent.
The following sections detail some optimization hints that could be useful on certain target platforms to maximise lwIP data throughput.
A major performance bottle-neck for lwIP is the software checksum code, since it
is executed frequently. If the underlying ethernet device driver provides
hardware checksum support then the
CHECKSUM_CHECK_* options can be disabled. However if
software checksums are needed then you may want to override the standard
checksum implementation. This can be achieved by adding
LWIP_CHKSUM definition to a header file included by lwIP,
e.g. adding the following to lwipopts.h:
#define LWIP_CHKSUM your_checksum_routine
lwip_standard_chksum()implementations from src/core/inet_chksum.c provide some C examples, though you might want to craft an assembly function for this specific case.
RFC#1071is a good introduction to this subject. A highly optimized assembler routine will provide the greatest improvement in overall lwIP performance for software checksum based systems.
CYGIMP_LWIP_CHECKSUM_ON_COPY functionality is
enabled then support for calculating checksums when data is copied into the
stack (from application buffers into packet buffers) and can result in fewer
checksum calculations if a packet buffer is going to be used multiple times, or
if pre-calculated checksums are available for pre-built packets.
TF_SEG_DATA_CHECKSUMMEDflag is used internally by the lwIP TCP support to track whether a checksum has been set on the payload data.
Since network byte order is big-endian, other significant improvements can be made by supplying assembly or inline replacements for htons() and htonl() if you're using a little-endian architecture.
#define LWIP_PLATFORM_BYTESWAP 1 #define LWIP_PLATFORM_HTONS(x) your_htons #define LWIP_PLATFORM_HTONL(x) your_htonl
If the lwIP
option is enabled then lwIP will use the HAL supplied
CYGIMP_LWIP_HAL_BYTESWAP option is
enabled by default if the architecture indicates that optimised
byte-swap implementations are available, otherwise the option is
disabled by default and for little-endian architectures lwIP will
provide byte-swap functions.
The ethernet MAC device driver should ideally use interrupts and DMA to avoid busy loops wherever possible. Hardware support for scatter-gather DMA should be used if available, since multiple packet buffers can then be used to hold the different sections of a frame, allowing for zero-copy of payload data.
For a production release it is highly recommended to
The setting of the
configuration option and the memory configuration options described
in the Section called Performance will all affect the
overall RAM footprint required by lwIP.
However, as long as the option to use the standard run-time allocator
CYGFUN_LWIP_MEM_LIB_MALLOC) is NOT
enabled, the memory footprint of lwIP is deterministic and fixed by the selected
The major memory configuration options are listed below. Setting these configuration values is usually a compromise between the amount of physical RAM available on the target platform, and the lwIP throughput (performance) requirements.
- Heap size
This option defines the size of the heap that lwIP maintains separate from the system heap so that the resource requirements of one do not affect the other. It is primarily (although not exclusively) used as the memory pool from which packet buffers for transmission are allocated, when the data to be sent needs to be copied (type PBUF_RAM). It is also used to allocate space for dynamically created messages boxes and semaphores. This option can be increased to improve performance when sending large amounts of data.
- Packet buffer size
This option specifies the maximum size of data which a single packet buffer (pbuf) allocated from the packet buffer pool for incoming packets can contain. The overall memory footprint of each packet buffer is slightly larger to account for metadata. Incoming packets larger than this size are chained together, using additional packet buffers. If only short packets are usually received, memory efficiency may be improved by reducing the packet buffer size, even if this is accompanied by an increase in the number of packets in the pool using the
CYGNUM_LWIP_PBUF_POOL_SIZEoption. If larger packets tend to be received, the converse is true.
Note: Some network drivers set constraints on the value of this option, in order to better integrate with hardware properties.
- Incoming packet messages
CYGNUM_LWIP_MEMP_NUM_TCPIP_MSG), API messages (
When using the sequential API these options define the simultaneous number of, respectively, the packet input and API messages. These messages are used for communicating between external threads and the core lwIP network stack.
This option defines the maximum number of netbuf structures which may be in use simultaneously with the sequential API (which in turn are used by the BSD sockets API). Each netbuf structure corresponds to a chain of packet buffers to be used for sending or receiving data. This option may be set to 0 if the application will only be using the raw API.
This option defines the maximum number of netconn structures which may be in use simultaneously with the sequential API. Each netconn structure corresponds to a connection, whether active or inactive. This option may be set to 0 if the application will only be using the raw API.
- Packet buffer pool size
This option specifies the number of packet buffers (pbufs) present in the packet buffer pool. This pool is used to provide space for incoming data packets, and so this option limits the number of incoming data packets being processed, or pending (including those not yet read out from the stack by the application). It is also used to hold packet fragments if the option
CYGFUN_LWIP_IP_REASSis enabled, and so must be large enough to cover the
CYGNUM_LWIP_IP_REASS_MAX_PBUFSrequirement. Note that additional buffers are used in a chain when incoming packets are received which exceed the maximum size of each packet buffer. This option may be adjusted depending on the anticipated peak network traffic. Incoming packets are dropped when the pool is depleted.
- Number of memp packet buffers
The lwIP API allows packets to be transmitted which only contain a reference to the data being sent, instead of copying the data into a separate buffer. This can be useful when sending a lot of data out of ROM (or other static memory). This option specifies the number of such packets that can be used simultaneously. You may wish to increase the value of this option if the application sends a lot of such data, or reduce if not sending any of this form. These buffers are also used when IP fragmentation support is enabled, but a static buffer is not used (
CYGIMP_LWIP_IP_FRAG_USES_STATIC_BUFdisabled), so may also need increasing if fragmentation is common.
- RAW protocol control blocks
This option defines the number of RAW protocol control blocks that may be used simultaneously. One is required for each active RAW “connection”.
- UDP control blocks
This option defines the number of UDP protocol control blocks that may be used simultaneously. One is required for each active UDP “connection”.
- TCP control blocks
This option defines the number of TCP protocol control blocks that may be used simultaneously. One is required for each TCP connection. Hence this option defines the maximum number of TCP connections that may be open simultaneously. Increase the value of this option if more simultaneous TCP connections are required.
- Listening TCP control blocks
This option defines the number of protocol control blocks dedicated to listening for incoming TCP connection requests. This corresponds to the maximum number of TCP ports which may be simultaneously listened on.
- Queued TCP segments
This option defines the maximum number of TCP segments which may be simultaneously queued. This option may need to be adjusted if the stack reports memory failure errors when attempting to send large quantities of data through TCP connections simultaneously, or when individual TCP writes are so large that the number of MSS-sized segments exceeds the value of this option. If the option to allow out-of-order incoming packets (
CYGIMP_LWIP_TCP_QUEUE_OOSEQ) is enabled, then such segments may also be dropped if the maximum number of TCP segments specified in this option has been reached.
- Queued packets for ARP resolve
The number of simultaneously queued outgoing packet buffers that are waiting for an ARP request to finish to resolve their destination address.
- Queued IP reassembly packets
CYGNUM_LWIP_MEMP_NUM_REASSDATA), Simultaneous IP fragments (
These options provide respectively the number of packets that can simultaneously be queued for reassembly, and the number of fragments (not packets) that can be simultaneously queued for sending.
- System timeouts
CYGNUM_LWIP_MEMP_NUM_INTERNAL_TIMEOUTS), User timeouts (
INTERNALvalue is the number of timeout objects required to support the configured lwIP features. The
USERvalue defines the maximum number of user timeouts that may be pending simultaneously. The value of this option may need to be increased if there are more threads using the raw API, or if there are more threads calling the
select()BSD compatibility function.
- Multicast group members
This option defines the number of multicast groups whose network interfaces can be members at the same time. This value must be at least twice the number of active network interfaces active in the configuration.
- Leaf nodes
CYGNUM_LWIP_MEMP_NUM_SNMP_NODE), Root Node branches (
CYGNUM_LWIP_MEMP_NUM_SNMP_ROOTNODE), Variable bindings (
CYGNUM_LWIP_MEMP_NUM_SNMP_VARBIND), OIDs (
These options control the size and number of the SNMP agent related memory allocations.
CYGNUM_LWIP_MEMP_NUM_NETDB), Local host list entries (
If DNS support is enabled then these options respectively control the number of concurrent
lwip_addrinfo()calls supported, and the number of host entries in the dynamic local host list.
- Simultaneous PPP connections
CYGNUM_LWIP_MEMP_NUM_PPP_PCB), Concurrent PPPoE interfaces (
These options respectively control the number of simultaneously active PPP connections, and the number of concurrently active PPPoE connections.
The following size information was gathered from a CortexM3 targeted
configuration using the eCosCentric
GNU tools (version 4.4.5c)
selected. The byte sizes are provided to give an example
overview of the lwIP footprint that can be expected, and are purely for
In the following builds “Basic” refers to a sequential API configuration with UDP and TCP support, but with most options disabled (no fragmentation or reassembly support, static address, no SNMP agent, no IGMP, etc.). The builds marked “Reassembly” refers to the addition of fragmented packet reassembly code to the “Basic” builds. The “Full” entry is a configuration with all the lwIP ethernet features enabled (excluding SNMP, SLIP and PPP) to give an idea of the upper footprint for a fully-featured ethernet build.
The values given are for the complete lwIP “library” package, so
specific application linkage (due to the eCos use
-ffunction-sections) means that not all of the code
and data measured in the sizes given below may actually be included in the final
executable. The footprint can be made even smaller by explicit use of the raw
bss values below do NOT
include the stack requirement for the sequential API thread, nor the main
configurable lwIP “heap” space. This is because the aim is to
present an example of the base lwIP requirement, independent of the configured
heap and stack space required for a particular application or target
|CortexM3 (STM32F2xx)||text + rodata||data||bss|
|Basic IPv4 static||40224||16||516|
|Basic IPv4 AutoIP||41660||16||516|
|Basic IPv4 DHCP||46712||16||520|
|Basic IPv4 & IPv6||58680||24||613|
|Reassembly IPv4 static||41928||16||526|
|Reassembly IPv4 & IPv6||60488||24||627|
|Full IPv4 & IPv6||80512||24||1843|
Note: Configurations built with the
CYGDBG_LWIP_STATS enabled will have a significantly
larger code footprint. Similarly configurations built with
CYGPKG_INFRA_DEBUG option or the
-O0 optimisation flag will also have a
significant effect on the footprint.
The example described in this section targets the STM3220G-EVAL platform, but similar figures have also been obtained for other platforms (e.g. AT91SAM7XEK).
With careful tuning it is possible to implement a simple raw API webserver using the httpd2 test example in ~32K of ROM and ~10K of RAM. This is for the complete application, thread stacks, network buffers, etc.
Even though httpd2 is a simple application
it does provide a real-world useful working data point for a minimal
footprint system. Note: For this example build
the httpd2.c source was modified to use the
The small_rom_stm3220g_httpd2.ecm example template used is provided in the lwIP package doc directory. The steps needed to build the minimal example binary are:
$ mkdir small_httpd2 $ cd small_httpd2 $ ecosconfig new stm3220g_eval [ … ecosconfig output elided … ] $ ecosconfig import $ECOS_REPOSITORY/net/lwip_tcpip/VERSION/doc/small_rom_stm3220g_httpd2.ecm $ ecosconfig resolve $ ecosconfig tree $ make tests [ … make output elided … ] $ arm-eabi-objcopy -O binary install/tests/net/lwip_tcpip/VERSION/tests/httpd2 httpd2.bin
The produced httpd2.bin binary can then be loaded
into the flash of the STM3220G-EVAL at