ABI Interface
Argument Passing
Arguments are passed in both registers and memory. The first six incoming arguments are stored in registers a2 through a7, and additional arguments are stored on the stack starting at the current stack pointer a1. Because Xtensa uses register windows that rotate during a function call, outgoing arguments that will become the incoming arguments must be stored to different register numbers. Depending on the call instruction and, thus, the rotation of the register window, the arguments are passed starting starting with register a(2+N), where N is the size of the window rotation. Therefore, the first argument in case of a call4
instruction is placed into a6, and for a call8
instruction into a10. Large arguments (8-bytes) are always passed in an even/odd register pair even if that means to omit a register for alignment. The return values are stored in a2 through a5 (so a function with return value occupying more than 2 registers may not be called with call12).
return addr stack ptr arg0, arg1, arg2, arg3, arg4, arg5 ----------- --------- ---------------------------------- a0 a1 a2, a3, a4, a5, a6, a7 call4 a4 a5 a6, a7, a8, a9, a10, a11 call8 a8 a9 a10, a11, a12, a13, a14, a15 call12 a12 a13 a14, a15 --- --- --- ---
Syscall ABI
Linux takes system-call arguments in registers. The ABI and Xtensa software conventions require the system-call number in a2. For improved efficiency, we try not to shift all parameters one register up to maintain the original order. Register a2 is, therefore, moved to a6, a6 to a8, and a7 to a9, if the system call requires these arguments.
syscall number arg0, arg1, arg2, arg3, arg4, arg5 -------------- ---------------------------------- a2 a6, a3, a4, a5, a8, a9
Upon return, a2 contains the return value or error code. All other registers are preserved.
TLS and NPTL implementation
This is a description of the Xtensa-specific aspects of supporting thread-local storage (TLS). Originally developed by Sun, the implementation concept for TLS has been modified and generalized by Ulrich Drepper for Linux for a variety of platforms. "ELF Handling for Thread-Local Storage" provides background information and descriptions of the machine-independent aspects. Alex Oliva later developed an alternative and heavily optimized version for a Linux port to the FR-V processor and subsequently ported it to x86, x86_64, and ARM Linux systems. See also Oliva's implementation details for X86 and ARM. The current status of this alternative approach is somewhat ambiguous: it has been accepted into GCC and Binutils but the glibc patches have not made it into the mainline tree, yet. As a new port, however, Xtensa can uses this optimized implementation without affecting legacy software. Although it would be convenient to share some of the platform-independent code, it is not required and will work regardless of whether those patches are ever accepted.
Run-Time Handling of TLS
The __tls_get_addr function has a different prototype with this approach:
extern void *__tls_get_addr (tls_index *ti);
(This is actually the same prototype for __tls_get_addr that is used for most processors with the standard approach.)
TLS Access Models
The Initial Exec and Local Exec models are the same as with the standard approach and are not shown here.
General Dynamic Model
For the general dynamic model, the code sequence loads a function pointer and a single argument from the GOT and calls the function with that argument. If the thread-local symbol is in the static TLS, the runtime linker will set the argument to the TLS offset and the function to a short routine that returns the offset plus the thread pointer. Otherwise, the function is set to __tls_get_addr and the argument is a pointer to the tls_index structure holding the offset and module values. This tls_index structure is dynamically allocated in a hash table by the runtime linker.
This approach allows lazy relocation processing for TLS references, which should improve the start-up times. The runtime linker can initialize the TLS function pointer in the GOT to a resolver function. The details of this are presumably similar to other processors and are not yet specified for Xtensa.
Location | Code Sequence | Initial Relocations | Dynamic Relocations |
---|---|---|---|
.literal |
.LC0 .LC1 |
R_XTENSA_TLSDESC_FN (x) R_XTENSA_TLSDESC_ARG (x) |
R_XTENSA_TLSDESC_FN (x) R_XTENSA_TLSDESC_ARG (x) |
.text |
0x00 l32r a8, .LC0 0x03 l32r a10, .LC1 0x06 callx8 a8 |
R_XTENSA_TLS_FUNC (x) R_XTENSA_TLS_ARG (x) R_XTENSA_TLS_CALL (x) |
|
The assembly syntax for this instruction sequence is:
movi a8, x@TLSFUNC movi a10, x@TLSARG callx8.tls a8, x@TLSCALL
This relies on the assembler to relax the movi
instructions to l32r
instructions, so that both the literals and the instructions get the appropriate relocations. The callx8.tls
assembler macro generates a callx8
instruction with an extra relocation specified by its second operand.
Local Dynamic Model
This model uses the same relocations and functions as the general dynamic model except that it references a special linker-defined _TLS_MODULE_BASE_ symbol which is set to the start of the local TLS space.
Location | Code Sequence | Initial Relocations | Dynamic Relocations |
---|---|---|---|
.literal |
.LC0 .LC1 .LC2 .LC3 |
R_XTENSA_TLSDESC_FN (_TLS_MODULE_BASE_) R_XTENSA_TLSDESC_ARG (_TLS_MODULE_BASE_) R_XTENSA_TLS_DTPOFF (x) R_XTENSA_TLS_DTPOFF (y) |
R_XTENSA_TLSDESC_FN (_TLS_MODULE_BASE_) R_XTENSA_TLSDESC_ARG (_TLS_MODULE_BASE_) |
.text |
0x00 l32r a8, .LC0 0x03 l32r a10, .LC1 0x06 callx8 a8 ... 0x09 l32r a12, .LC2 0x0c add a12, a12, a10 ... 0x0f l32r a13, .LC3 0x12 add a13, a13, a10 |
R_XTENSA_TLS_FUNC (_TLS_MODULE_BASE_) R_XTENSA_TLS_ARG (_TLS_MODULE_BASE_) R_XTENSA_TLS_CALL (_TLS_MODULE_BASE_) |
|
The assembly syntax for this code sequence is:
movi a8, _TLS_MODULE_BASE_@TLSFUNC movi a10, _TLS_MODULE_BASE_@TLSARG callx8.tls a8, _TLS_MODULE_BASE_@TLSCALL ... movi a12, x@DTPOFF add a12, a12, a10 ... movi a13, y@DTPOFF add a13, a13, a10
Linker Optimizations
General Dynamic -> Initial Exec
.literal .LC0 .literal .LC1 0x00 l32r a8, .LC0 0x03 l32r a10, .LC1 0x06 callx8 a8 |
R_XTENSA_TLSDESC_FN (x) R_XTENSA_TLSDESC_ARG (x) R_XTENSA_TLS_FUNC (x) R_XTENSA_TLS_ARG (x) R_XTENSA_TLS_CALL (x) |
-> |
.literal .LC1 0x00 rur a8, THREADPTR 0x03 l32r a10, .LC1 0x06 add a10, a10, a8 |
R_XTENSA_TLS_TPOFF (x) |
Local Dynamic -> Local Exec
.literal .LC0 .literal .LC1 .literal .LC2 0x00 l32r a8, .LC0 0x03 l32r a10, .LC1 0x06 callx8 a8 ... 0x09 l32r a12, .LC2 0x0c add a12, a12, a10 |
R_XTENSA_TLSDESC_FN (_TLS_MODULE_BASE_) R_XTENSA_TLSDESC_ARG (_TLS_MODULE_BASE_) R_XTENSA_TLS_DTPOFF (x) R_XTENSA_TLS_FUNC (_TLS_MODULE_BASE_) R_XTENSA_TLS_ARG (_TLS_MODULE_BASE_) R_XTENSA_TLS_CALL (_TLS_MODULE_BASE_) |
-> |
.literal .LC2 0x00 nop 0x03 nop 0x06 rur a10, THREADPTR ... 0x09 l32r a12, .LC2 0x0c add a12, a12, a10 |
R_XTENSA_TLS_TPOFF (x) |
New Xtensa ELF Definitions
.literal R_XTENSA_TLSDESC_FN R_XTENSA_TLSDESC_ARG R_XTENSA_TLS_DTPOFF R_XTENSA_TLS_TPOFF
.text R_XTENSA_TLS_FUNC R_XTENSA_TLS_ARG R_XTENSA_TLS_CALL