11.4 Pointers Are Not Always Addresses

On almost all 32-bit architectures, the representation of a pointer is indistinguishable from the representation of some fixed-length number whose value is the byte address of the object pointed to. On such machines, the words “pointer” and “address” can be used interchangeably. However, architectures with smaller word sizes are often cramped for address space, so they may choose a pointer representation that breaks this identity, and allows a larger code address space.

For example, the Renesas D10V is a 16-bit VLIW processor whose instructions are 32 bits long1. If the D10V used ordinary byte addresses to refer to code locations, then the processor would only be able to address 64kb of instructions. However, since instructions must be aligned on four-byte boundaries, the low two bits of any valid instruction's byte address are always zero—byte addresses waste two bits. So instead of byte addresses, the D10V uses word addresses—byte addresses shifted right two bits—to refer to code. Thus, the D10V can use 16-bit words to address 256kb of code space.

However, this means that code pointers and data pointers have different forms on the D10V. The 16-bit word 0xC020 refers to byte address 0xC020 when used as a data address, but refers to byte address 0x30080 when used as a code address.

(The D10V also uses separate code and data address spaces, which also affects the correspondence between pointers and addresses, but we're going to ignore that here; this example is already too long.)

To cope with architectures like this—the D10V is not the only one!—gdb tries to distinguish between addresses, which are byte numbers, and pointers, which are the target's representation of an address of a particular type of data. In the example above, 0xC020 is the pointer, which refers to one of the addresses 0xC020 or 0x30080, depending on the type imposed upon it. gdb provides functions for turning a pointer into an address and vice versa, in the appropriate way for the current architecture.

Unfortunately, since addresses and pointers are identical on almost all processors, this distinction tends to bit-rot pretty quickly. Thus, each time you port gdb to an architecture which does distinguish between pointers and addresses, you'll probably need to clean up some architecture-independent code.

Here are functions which convert between pointers and addresses:

— Function: CORE_ADDR extract_typed_address (void *buf, struct type *type)

Treat the bytes at buf as a pointer or reference of type type, and return the address it represents, in a manner appropriate for the current architecture. This yields an address gdb can use to read target memory, disassemble, etc. Note that buf refers to a buffer in gdb's memory, not the inferior's.

For example, if the current architecture is the Intel x86, this function extracts a little-endian integer of the appropriate length from buf and returns it. However, if the current architecture is the D10V, this function will return a 16-bit integer extracted from buf, multiplied by four if type is a pointer to a function.

If type is not a pointer or reference type, then this function will signal an internal error.

— Function: CORE_ADDR store_typed_address (void *buf, struct type *type, CORE_ADDR addr)

Store the address addr in buf, in the proper format for a pointer of type type in the current architecture. Note that buf refers to a buffer in gdb's memory, not the inferior's.

For example, if the current architecture is the Intel x86, this function stores addr unmodified as a little-endian integer of the appropriate length in buf. However, if the current architecture is the D10V, this function divides addr by four if type is a pointer to a function, and then stores it in buf.

If type is not a pointer or reference type, then this function will signal an internal error.

— Function: CORE_ADDR value_as_address (struct value *val)

Assuming that val is a pointer, return the address it represents, as appropriate for the current architecture.

This function actually works on integral values, as well as pointers. For pointers, it performs architecture-specific conversions as described above for extract_typed_address.

— Function: CORE_ADDR value_from_pointer (struct type *type, CORE_ADDR addr)

Create and return a value representing a pointer of type type to the address addr, as appropriate for the current architecture. This function performs architecture-specific conversions as described above for store_typed_address.

Here are two functions which architectures can define to indicate the relationship between pointers and addresses. These have default definitions, appropriate for architectures on which all pointers are simple unsigned byte addresses.

— Function: CORE_ADDR gdbarch_pointer_to_address (struct gdbarch *gdbarch, struct type *type, char *buf)

Assume that buf holds a pointer of type type, in the appropriate format for the current architecture. Return the byte address the pointer refers to.

This function may safely assume that type is either a pointer or a C++ reference type.

— Function: void gdbarch_address_to_pointer (struct gdbarch *gdbarch, struct type *type, char *buf, CORE_ADDR addr)

Store in buf a pointer of type type representing the address addr, in the appropriate format for the current architecture.

This function may safely assume that type is either a pointer or a C++ reference type.


Footnotes

[1] Some D10V instructions are actually pairs of 16-bit sub-instructions. However, since you can't jump into the middle of such a pair, code addresses can only refer to full 32 bit instructions, which is what matters in this explanation.