Author: Siro Mugabi

Category: ELF Support

Summary:

This entry discusses various issues with respect to a program's C Run-time on a GNU/Linux system. The development platform used was Ubuntu 12.04 AMD64. This material applies to ELF objects.

Tags: gnu/linux toolchain gnu linux s/w development elf support debugging/tracing

Overview

Userland C programs on a GNU/Linux system need to be linked with the system's C library. In addition to the standard C library functions, the C library also provides a standard run-time environment. This run-time includes a block of startup code that runs before the C program begins, and another block of shutdown code that runs after the program exits normally. The C program's main gets called by the startup code.

Typically, during the lifetime of the C program, the C library will perform system calls on behalf of the program. This happens when the program invokes the system call wrapper API, or certain library functions, exported by the C library.

Eventually, if the C program terminates normally (i.e. via exit(3)), control will be passed to the C run-time shutdown code which will do some cleanup before returning control to Linux.

This startup and shutdown code is embedded in the final executable binary of the C program during the static linking phase. In the semantics of C run-time, the C program proper is, technically, a procedure:

c runtime

(Diagram based on Assembly Language Step by Step: Programming with Linux, 3rd Edition, Jeff Duntemann, Wiley Publishing Inc)

Static linking

Consider the following hello world C program:

$ cat vipi.c

#include <stdio.h>
int main(void)
{
    printf("Dunia, vipi?\n");
    return 0;
}

Running the following gcc command:

$ gcc -Wall -O2 vipi.c -Wl,-v
collect2 version 4.6.3 (x86-64 Linux/ELF)
/usr/bin/ld --sysroot=/ --build-id --no-add-needed --as-needed --eh-frame-hdr -m elf_x86_64 --hash-style=gnu -dynamic-linker /lib64/ld-linux-x86-64.so.2 -z relro /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. /tmp/ccy9irfI.o -v -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o
GNU ld (GNU Binutils for Ubuntu) 2.22

reveals the link order and inclusion of the C run-time into the program to create an executable. This process can be generalized as:

$ ld crt1.o crti.o crtbegin.o [-L paths] [user ojects] [gcc libs] [C libs] [gcc libs] crtend.o crtn.o

where crt stands for "C Run-time". The crtbegin.o and crtend.o files are part of the GNU Compiler Collection (GCC) package support libraries while the rest of the crt?.o files are part of the standard C library (eglibc in this case).

The following is a description of the C run-time relocatable object files. Their disassembly can be viewed by way of, say, objdump(1).

  • crt1.o

    This object file defines the _start symbol. The manner in which this code handles program bootstrap is highly dependent on the particular C library implementation. Some systems use crt0.o while others may even specify crt2.o or higher. Ultimately, whatever gcc has encoded should correspond to the C library in use.

  • crti.o and crtn.o

    crti.o defines the _init and _fini function prologs for the .init and .fini sections, respectively. crtn.o defines the corresponding function epilogs. When the static linker eventually merges all .init and .fini sections of its input object files, the DT_INIT and DT_FINI tags in the dynamic section of its output object file will correspond to the addresses of the complete _init and _fini symbols, respectively.

    During run-time, _start sets up some way that the _init and _fini symbols will get invoked e.g. via the __libc_csu_init and __libc_csu_fini symbols, respectively, of the C library.

  • crtbegin.o and crtend.o

    The details of the symbols and sections defined in these files vary among architectures. With the Ubuntu 12.04 AMD64 toolchain, these include legacy code that GCC used to find the constructors and destructors i.e. __do_global_dtors_aux and __do_global_ctors_aux.

Also see dev.gentoo.org/~vapier/crt.txt.

Execution time

Running1:

$ ./a.out

causes the shell to invoke execve(2) with "./a.out" as the filename argument, i.e. after performing some preliminary checks such as whether the file is present at the specified location, user access permissions, etc.

For statically linked applications2, the load process only requires the kernel to make the binary available in its fixed load address before initializing the Program Counter (PC) for the process with the address of the _start symbol. On the other hand, for dynamically linked applications, the kernel first transfers control to the dynamic linker. In turn, the dynamic linker loads the required shared object dependencies and performs any immediate relocations (by default, lazy relocations for function references are performed later on when the symbols are actually referenced). It then methodically runs the initialization code for the loaded shared objects before handing control over to the executable's _start.

Entering the executable's _start concludes the application's load process and control proceeds to the executable's C run-time code before eventually reaching main.

Constructors, Destructors and the main Function

The main function is the usual starting point in the programmer's C code. See The main Function for a discussion of main( ). The gcc extension, __attribute__((constructor)), may be specified against a function to have its code execute prior to main. Similarly, __attribute__((destructor)) may be used to specify code that will run after the program's main completes, or if the program exits normally via exit(3). Generally, these constructor and destructor functions should be of type void and should not be exported e.g. specified with the static keyword3. Section Examples includes a few examples of real world applications which rely on these constructor and destructor call mechanisms.

Consider the following program in /tmp:

$ cd /tmp

$ cat prog.c

#include <stdio.h>

static void __attribute__((constructor)) 
ctor(void)
{
    printf("%s:: Constructor funk.\n", __func__);
}

static void __attribute__((destructor))
dtor(void)
{
    printf("%s:: Destructor funk.\n", __func__);
}

int main(void)
{
    printf("%s:: PID: %d\n", __func__, getpid());
    puts("\n\n\tHit \"ENTER\" to quit.\n\n");
    getchar();
    return 0;
}

To simplify illustration, 32-bit code will be considered - but the general flow also applies for x86-64:

$ gcc -Wall -O2 -m32 prog.c

Note that the __attribute__((constructor)) and __attribute__((destructor)) extensions result in the inclusion of the static linker generated DT_INIT_ARRAY, DT_INIT_ARRAYSZ and DT_FINI_ARRAY, DT_FINI_ARRAYSZ tags, respectively, in the dynamic section of the a.out executable:

$ readelf -d a.out | grep ARRAY
 0x00000019 (INIT_ARRAY)                 0x8049eec
 0x0000001b (INIT_ARRAYSZ)               4 (bytes)
 0x0000001a (FINI_ARRAY)                 0x8049ef0
 0x0000001c (FINI_ARRAYSZ)               4 (bytes)

These tags correspond to the address and size of the .init_array and .fini_array sections, respectively. Each of these sections hold a table of pointers (in little endian format) to the constructor and destructor functions, respectively. In other words:

Disassembly of section .init_array:
08049eec <__init_array_start>:
 8049eec:       30                      .byte 0x30
 8049eed:       84 04 08                test   %al,(%eax,%ecx,1)

Disassembly of section .fini_array:
08049ef0 <.fini_array>:
 8049ef0:       00                      .byte 0x0
 8049ef1:       84 04 08                test   %al,(%eax,%ecx,1)

which point to:

$ objdump -D a.out | egrep '(\<ctor\>|\<dtor\>)'
08048400 <dtor>:
08048430 <ctor>:

Tracing C run-time with ltrace

The ltrace(1) libray call tracer utility can be used to obtain a trace of function calls. The ltrace(1) command line accepts several options including:

  • -x extern This switch traces the function symbol extern.

  • -n N or --indent N This switch indents trace output by N number of spaces for each new nested call.

  • -i Prints the value of the processor's instruction pointer - or, more generally, the program counter (PC) - at the time of the library call.

Now, since the PC indicates the address in memory of the next instruction to be executed, this value can be used indentify the call instruction (from the disassembly of the executable) that invoked the symbol specified against -x, or the code segment of a shared object from which the symbol was called. For details of the display format in the ltrace(1) output, see /etc/ltrace.conf.

$ ltrace --version
ltrace version 0.5.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions.  There is NO warranty.

$ ltrace -i -n 4 -x _start -x __libc_csu_init -x _init  \
                 -x __do_global_ctors_aux -x ctor -x main -x __libc_csu_fini -x _fini \
                 -x __do_global_dtors_aux -x dtor /tmp/a.out

[0x1] _start(0xffe0a494, 0, 0xffe0a49f, 0xffe0a4b2, 0xffe0a4de <unfinished ...>
[0x80484cd]     __libc_start_main(0x8048460, 1, 0xffe08c04, 0x8048560, 0x80485d0 <unfinished ...>
[0xf760944a]         __libc_csu_init(1, 0xffe08c04, 0xffe08c0c, 0xf7797000, 0 <unfinished ...>
[0x8048581]             _init(-1, 0xf7623116, 0xf7792ff4, 0xf76231a5, 0xf77ca660 <unfinished ...>
[0x8048385]                 __do_global_ctors_aux(0xf77933e4, 32768, 0x8049ff4, 0x8048581, -1)                    = -1
[0x8048581]             <... _init resumed> )                                                                     = -1
[0x80485b2]             ctor(1, 0xffe08c04, 0xffe08c0c, 0xf76231a5, 0xf77ca660 <unfinished ...>
[0x804844f]                 __printf_chk(1, 0x8048630, 0x8048687, 0x8048385, 0xf77933e4ctor:: Constructor funk.
)                          = 25
[0x80485b2]             <... ctor resumed> )                                                                      = 25
[0xf760944a]         <... __libc_csu_init resumed> )                                                              = 25
[0xf76094b3]         main(1, 0xffe08c04, 0xffe08c0c, 0xf7797000, 0 <unfinished ...>
[0x804846e]             getpid()                                                                                  = 19595
[0x804848e]             __printf_chk(1, 0x804865f, 0x8048691, 19595, 0x8048560main:: PID: 19595
)                                   = 18
[0x804849a]             puts("\n\n\tHit "ENTER" to quit.\n\n"

    Hit "RETURN" to quit.

At this point, run cat /proc/PID/maps to view the mappings of the memory areas in the process' virtual address space. In this particular instance, cat /proc/19595/maps was executed on some other terminal. Then strike RETURN on the current terminal to complete program execution.

)                                                    = 26
[0x80484a7]             _IO_getc(0xf7793ac0
)                                                                      = '\n'
[0xf76094b3]         <... main resumed> )                                                                         = 0
[0xf77ca82c]         dtor(0xf77dc4e4, 0xf77dbff4, 0xf7793a20, 0xf7662495, 0xf77dc918 <unfinished ...>
[0x804841f]             __printf_chk(1, 0x8048648, 0x804868c, 0xf77dbff4, 0dtor:: Destructor funk.
)                                      = 24
[0xf77ca82c]         <... dtor resumed> )                                                                         = 24
[0xf77ca845]         _fini(0xf77dc4e4, 0xf77dbff4, 0xf7793a20, 0xf7662495, 0xf77dc918 <unfinished ...>
[0x8048621]             __do_global_dtors_aux(0, 0, 0xf77dbff4, 0xf77ca845, 0xf77dc4e4)                           = 0
[0xf77ca845]         <... _fini resumed> )                                                                        = 0
[0xffffffff] +++ exited (status 0) +++

The /proc/19595/maps gave:

$ cat /proc/19595/maps
08048000-08049000 r-xp 00000000 08:08 299486                             /tmp/a.out
08049000-0804a000 r--p 00000000 08:08 299486                             /tmp/a.out
0804a000-0804b000 rw-p 00001000 08:08 299486                             /tmp/a.out
f75ef000-f75f0000 rw-p 00000000 00:00 0 
f75f0000-f7791000 r-xp 00000000 08:08 283173                             /lib32/libc-2.15.so
f7791000-f7793000 r--p 001a1000 08:08 283173                             /lib32/libc-2.15.so
f7793000-f7794000 rw-p 001a3000 08:08 283173                             /lib32/libc-2.15.so
f7794000-f7798000 rw-p 00000000 00:00 0 
f77b7000-f77ba000 rw-p 00000000 00:00 0 
f77ba000-f77bb000 r-xp 00000000 00:00 0                                  [vdso]
f77bb000-f77db000 r-xp 00000000 08:08 283196                             /lib32/ld-2.15.so
f77db000-f77dc000 r--p 0001f000 08:08 283196                             /lib32/ld-2.15.so
f77dc000-f77dd000 rw-p 00020000 08:08 283196                             /lib32/ld-2.15.so
ffdea000-ffe0b000 rw-p 00000000 00:00 0                                  [stack]

As shown in the ltrace(1) output above, _start calls __libc_start_main (defined in /lib32/libc-2.15.so) with a list of arguments. The objdump(1) output below shows how _start prepares the stack with these arguments in accordance with the x86 cdecl calling conventions:

$ objdump -D a.out | less
[...]
080484ac <_start>:
 [...]
 80484b7:       68 d0 85 04 08          push   $0x80485d0
 80484bc:       68 60 85 04 08          push   $0x8048560
 80484c1:       51                      push   %ecx
 80484c2:       56                      push   %esi
 80484c3:       68 60 84 04 08          push   $0x8048460
 80484c8:       e8 13 ff ff ff          call   80483e0 <__libc_start_main@plt>
 80484cd:       f4                      hlt

where the immediate values $0x80485d0, $0x8048560 and $0x8048460 pushed to the stack are the address locations of the following symbols in a.out:

08048460 <main>:
[...]
08048560 <__libc_csu_init>:
[...]
080485d0 <__libc_csu_fini>:
[...]

The ltrace(1) output reveals the order that these functions get called. Cross-referencing the respective PC values with the /proc/PID/maps output reveals the code segment from where these functions are invoked.

Constructor function invocation

From the ltrace(1) output above, it can be seen that the ctor constructor function gets called directly by way of __libc_csu_init rather than the legacy __do_global_ctor_aux mechanism by gcc's C run-time. Looking at the corresponding PC value, it is easy to infer that the ctor function was invoked from:

08048560 <__libc_csu_init>:
[...]
 80485ab:       ff 94 b3 f8 fe ff ff    call   *-0x108(%ebx,%esi,4)
 80485b2:       83 c6 01                add    $0x1,%esi
[...]

This indirect call references an address in the __init_array_start table by way of the obscure scaled indexed operand value. In this particular example, the table:

Disassembly of section .init_array:

08049eec <__init_array_start>:
 8049eec:       30                      .byte 0x30
 8049eed:       84 04 08                test   %al,(%eax,%ecx,1)

held the address 0x08048430 (in little endian format) which was the address of the ctor constructor function.

Destructor function invocation

Now, from the value in the PC in the ltrace(2) output, the dtor destructor function (and even the _fini symbol) was apparently invoked by a call instruction in the dynamic linker's code segment ... eh!? More over, the __libc_csu_fini symbol did not even appear in the trace ... and here is what this symbol's definition looked like in the dynamically linked a.out executable:

080485d0 <__libc_csu_fini>:
 80485d0:       f3 c3                   repz ret

A code inspection of a statically linked version of prog.c (apparently) gives a better illustration:

$ gcc -Wall -O2 -m32 --static prog.c -o prog_static

$ objdump -D prog_static | less
[...]
08048bb0 <dtor>:
[...]
08048be0 <fini>:
[...]
08049690 <__libc_csu_fini>:
 8049690:       53                      push   %ebx
 8049691:       bb a8 ef 0e 08          mov    $0x80eefa8,%ebx
 8049696:       81 eb a0 ef 0e 08       sub    $0x80eefa0,%ebx
 804969c:       83 ec 08                sub    $0x8,%esp
 804969f:       c1 fb 02                sar    $0x2,%ebx
 80496a2:       85 db                   test   %ebx,%ebx
 80496a4:       74 0e                   je     80496b4 <__libc_csu_fini+0x24>
 80496a6:       66 90                   xchg   %ax,%ax
 80496a8:       ff 14 9d 9c ef 0e 08    call   *0x80eef9c(,%ebx,4)
 80496af:       83 eb 01                sub    $0x1,%ebx
 80496b2:       75 f4                   jne    80496a8 <__libc_csu_fini+0x18>
 80496b4:       83 c4 08                add    $0x8,%esp
 80496b7:       5b                      pop    %ebx
 80496b8:       e9 b3 bf 07 00          jmp    80c5670 <_fini>
 80496bd:       90                      nop
 80496be:       90                      nop
 80496bf:       90                      nop
[...]
080c5670 <_fini>:
[...]
080eefa0 <__fini_array_start>:
 80eefa0:       b0 8b                   mov    $0x8b,%al
 80eefa2:       04 08                   add    $0x8,%al
 80eefa4:       e0 8b                   loopne 80eef31 <__FRAME_END__+0x1fcd>
 80eefa6:       04 08                   add    $0x8,%al

In other words, __libc_csu_fini starts by looping over the __fini_array_start table of addresses, calling:

  • the C library symbol fini (at 0x08048be0) then,
  • the dtor destructor function (at 0x08048bb0)

before completing with a jmp to _fini. Also note that the __do_global_dtors_aux mechanism by the gcc C run-time is obsolete and no longer used for invoking destructor functions.

Examples

A few examples of real world applications which rely on the C runtime constructor and destructor call mechanisms:

Resources and Further Reading

Footnotes

1. The file a.out here is actually an ELF executable. It is the default output file name of the gcc vipi.c command. [go back]

2. Application here refers to the executable object module and its (shared) object dependencies. [go back]

3. See How To Write Shared Libraries, Ulrich Drepper (Available online) [go back]