Author: Siro Mugabi

Category: ELF Support

Summary:

An introduction to the ELF object file format for the GNU/Linux system. It is written from the programmer's perspective rather than that of the developer of utilities/tools that actually process/generate ELF object files. The C language is mainly assumed. Development platform used was Ubuntu 12.04 AMD64.

Tags: gnu/linux toolchain gnu linux s/w development elf support introduction

Introduction

The Executable and Linking Format (ELF), originally developed and published by UNIX System Laboratories (USL), is the de-facto binary object file format on a GNU/Linux system. It replaced the older and simpler formats, notably a.out, which became obsolete because they didn't include elegant support for dynamic linking, cross-compilation, C++, etc. It is for such historical reasons that, for example, the default gcc(1) output for an ELF executable is still named a.out. The UNIX System V standard also included support for the Common Object File Format (COFF) prior to ELF and after a.out. ELF is a versatile format that addresses the limitations of its predecessors by including better support for C++, dynamic linking, and cross-build by containing enough information in each ELF object module to identify the target architecture and byte order.

An object file format is part of the system's Application Binary Interface (ABI). Object files are binary representations of programs intended to execute directly on a processor - unlike programs that require other abstract mechanisms. As far as program linking is concerned, there are three main categories of ELF object files: relocatable, executable and shared object files. These are described more fully in ELF Object File Types. There also exists a type of ELF object files known as core dump files which get generated by the operating system upon certain conditions of abnormal process termination. These disk files are an image of (portions of) the process' memory at the time of its termination. Core dump files are typically used with a debugger (e.g. gdb(1)) for inspection of the process' state at the time of its termination.

For developers of "ordinary" applications, generation of ELF object files (from source code in text files) and/or their processing is usually abstracted by the toolchain: The GNU Compiler Collection (GCC) package is a suite of compilers for several major programming languages including C (gcc(1)), C++ (g++(1)), etc. The binutils package contains the GNU Assembler (as(1)), GNU link editor, (ld(1)) - a.k.a static linker - and other useful tools e.g. readelf(1), objdump(1) and nm(1). Several other binary utilities are also available from other packages such as hd(1)/hexdump(1). The program interpreter (ld-linux.so, ld-uClibc.so, etc) - a.k.a dynamic linker/loader - is part of the C library package. Check out GNU/Linux Toolchain Intro for a more detailed background coverage of these tools and utilties.

For programmatic access/control of ELF object modules, there exist the <elf.h>, <gelf.h>, <link.h> etc header files, and libraries such as libelf.*. These development files are normally of interest to developers of tools that process or generate ELF files. Nevertheless, references to ELF structures and constants throughout this entry will be made with respect to elf(5) which adheres to the definitions in <elf.h>.

An object file format facilitates program linking and loading. Interactions between the Linux virtual memory subsystem and the ELF object file format (abstracted by the toolchain) are tightly coupled. For example, the (UNIX) concept of a separate virtual address space of uniform format (e.g. the same first virtual load address) for each process is reflected in the organization of executable ELF object files. This format simplifies program linking by allowing the generation of fully linked executables that are independent of the ultimate location of the code and data in physical memory. The organization of an ELF object file in conjuction with mechanisms such as memory mapping facilitate program loading and file sharing between processes. In a loadable object file, the link editor isolates program instructions and read-only data into parts of the file that may be mapped read-only, and read-write data into another part that may be mapped copy-on-write (COW). At load time, the operating system, guided by header tables in the object file, memory maps these parts into the address space of the process, creating distinct and page-aligned virtual memory areas (VMAs). Attributes of these VMAs (i.e. read-only, read-write, or executable) generally have a direct relationship with the organization of the program in the object file. For example, see ELF Sections & Segments and Linux VMA Mappings. Multiple instances of the same program can map and share a single copy of the physical pages marked read-only and/or executable into their address space. These pages typically hold the parts of the object file containing the program's instructions and read-only data. Similarly, execution instances of different programs can map a single copy of the physical pages of a shared object that are marked read-only and/or executable. Fortunately, these intricacies remain largely transperent to the (applications) programmer.

The following hello world program will be referred to throughout this entry:

#include <stdio.h>

int g;

int main(void)
{
    printf("%d:Dunia, vipi?.\n", g);
    return 0;
}

Generate the following object files:

  • NOTE: The build platform used here was Ubuntu 12.04 AMD64: gcc(1)'s -m switches for x86-64 userspace support three code models via -mcmodel=. The default, -mcmodel=small, is used here. Basically, the program and its symbols must be linked in the lower 2GB of the address space. For more details, see gcc(1) and other (online) documentation such as the System V Application Binary Interface: AMD64 Architecture Suppliment.

.

$ gcc -Wall -O2 vipi.c -c

$ gcc vipi.o -o vipi

$ ls 
vipi vipi.c vipi.o

ELF Header

All ELF object files include an ELF header which resides at the beginning of the file. It holds a "road map" describing the object file's organization. The definition of the ELF header structure, ElfN_Ehdr, (where N is either 32 or 64) can be seen in elf(5). In an object file, the ELF header can be viewed via, say:

$ readelf -h vipi
ELF Header:
    Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
    Class:                             ELF64
    Data:                              2's complement, little endian
    Version:                           1 (current)
    OS/ABI:                            UNIX - System V
    ABI Version:                       0
    Type:                              EXEC (Executable file)
    Machine:                           Advanced Micro Devices X86-64
    Version:                           0x1
    Entry point address:               0x400464
    Start of program headers:          64 (bytes into file)
    Start of section headers:          4424 (bytes into file)
    Flags:                             0x0
    Size of this header:               64 (bytes)
    Size of program headers:           56 (bytes)
    Number of program headers:         9
    Size of section headers:           64 (bytes)
    Number of section headers:         30
    Section header string table index: 27

The first entry, e_ident[16] (i.e. MAGIC), displays the first 16 bytes of the object file. The first 4 bytes contain the magic number, 0x7f E L F (ASCII code). The details of the e_ident[16] field are described in elf(5). Notice that the more interesting members of this array, i.e. EI_CLASS, EI_DATA, EI_VERSION and EI_OSABI, also have their values conviniently listed in entries following the MAGIC entry. EI_CLASS and EI_DATA enable the ELF file to be decodable on machines with a different byte order from the file's target architecture.

The e_type field identifies the object file type. From <elf.h>:

#define ET_REL      1       /* Relocatable file */
#define ET_EXEC     2       /* Executable file */
#define ET_DYN      3       /* Shared object file */

and these get displayed by readelf as:

$ readelf -h vipi.o | grep Type
Type:                              REL (Relocatable file)

$ readelf -h /lib/x86_64-linux-gnu/libc-2.15.so | grep Type
Type:                              DYN (Shared object file)

The e_machine field describes the target archicture for the ELF object file e.g. AMD x86-64 (i.e. EM_X86_64) in this case. The e_entry field (i.e. Entry point address) gives the virtual address to which the system first transfers control to the program. This field is mainly relevant for executables. For instance,

$ readelf -h vipi | grep Entry
    Entry point address:               0x400464

$ objdump -d vipi | grep '\<_start\>'
0000000000400464 <_start>:

where _start is the C run-time entry point to the program i.e. by the kernel upon execve(2) return (statically linked executables) or by the program interpreter, e.g. ld-linux.so, (dynamically linked executables).

Now, an application consists of several programs: an executable file and zero or more shared object files. Executables and shared objects contain segments which are groupings of one or more sections. The loadable segments contribute to the program's process image and, thus, provide an execution view of the object file. On the other hand, all object file types contain sections which hold the bulk of object file information for the linking1 view: instructions, data, symbol tables, relocation information etc. These concepts are illustrated in the following generalized diagram:

ELF: Linking View Vs. Execution View

A section header table contains information describing an object file's sections. It is a contiguous array of section headers. Generally, only relocatable object files are required to have a section header table. Likewise, a program (or segment) header table is a contiguous array of program headers which provide information describing the file's segments. Generally, executables and shared objects must have a program header table.

The e_shentsize and e_phentsize fields indicate the size in bytes of a section header and a program header, respectively. Note that these headers are of fixed length/size. The e_shnum and e_phnum fields yeild the number of section headers contained in the section header table and program headers contained in the program header table, respectively. Finally, e_shoff and e_phoff fields give the offset in bytes (i.e. the start or position) of the section header table and program header table, respectively, into the object file. Note that only the ELF header has a fixed position i.e. at the beginning of the file.

Due to, say, alignment restrictions, an object file may have inactive space (gaps). The various headers and sections might not "cover" every byte in an object file.

Sections

Sections are an important aspect of ELF. They provide the linking view of the object file. Each section contains specific info: for example, the text section contains instructions, the string table section contains a table of strings used for program symbols, the relocation tables contain relocation information for the linkers, etc.

Generally:

  • Every section in an object file has exactly one section header table entry (i.e. a section header) describing it. Nevertheless, section headers may exist that do not have a section.

  • Each section occupies a contiguous sequence of bytes within a file. A section may be empty.

  • Sections in a file may not overlap. No byte in a file resides in more than one section.

For a given ELF object file, the readelf -S command can be used to view its section header table:

$ readelf -S vipi.o
There are 14 section headers, starting at offset 0x158:

Section Headers:
    [Nr] Name              Type             Address           Offset
         Size              EntSize          Flags  Link  Info  Align
    [ 0]                   NULL             0000000000000000  00000000
         0000000000000000  0000000000000000           0     0     0
    [ 1] .text             PROGBITS         0000000000000000  00000040
         0000000000000000  0000000000000000  AX       0     0     4
    [ 2] .data             PROGBITS         0000000000000000  00000040
         0000000000000000  0000000000000000  WA       0     0     4
    [ 3] .bss              NOBITS           0000000000000000  00000040
         0000000000000000  0000000000000000  WA       0     0     4
    [ 4] .rodata.str1.1    PROGBITS         0000000000000000  00000040
         0000000000000012  0000000000000001 AMS       0     0     1
             ...               ...              ...            ...
    [11] .shstrtab         STRTAB           0000000000000000  000000e0
         0000000000000076  0000000000000000           0     0     1
    [12] .symtab           SYMTAB           0000000000000000  000004d8
         0000000000000138  0000000000000018          13    10     8
    [13] .strtab           STRTAB           0000000000000000  00000610
         000000000000001c  0000000000000000           0     0     1
Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)

A detailed description of these fields and the definition of the section header structure (Elf32_Shdr and Elf64_Shdr) can be found in elf(5). The first section header table entry is always an all-zero entry. Note that for sh_name, since section names are variable length strings, and since the section header is of fixed size, the actual section name strings reside in an object-wide section header string table section, .shstrtab. Instead, sh_name holds an index into this string table section. The Name field of the readelf -S command abstracts this. Also notice that the section names in this field (by this simple program) are the standard compiler-generated section names. The GNU/Linux toolchain also includes extensions for ELF that allow the programmer to specify arbitrary section names in the source code in which to place code or data. For example, see Placing Functions or Data in Arbitrary Sections. The e_shstrndx field of the ELF header holds the section header table index of .shstrtab.

Also notice how a description of the values of the sh_flags field is conviniently included at the end of readelf -S command's output.

The interpretation of values of sh_addr depend on the type of object file. For executable object files, it holds the address which the section's first byte should reside in the process' linear virtual address space. For instance, try running readelf -S vipi. For shared object files, this value is may get relocated to the execution time load address. For relocatable object files, this value is set to zero - as shown in the listing above.

The first field in the readelf -S output (labelled Nr) denotes the section header table indicies. This information is useful when, say, identifying the sections in which the various symbols defined in the program reside. For instance, if index 13 corresponds to the .text section header table entry:

$ readelf -S vipi | grep '.text'
    [13] .text             PROGBITS         0000000000400440  00000440

then function symbols residing in the .text section will have their associated section header table index set to 13 in the symbol table, for instance:

$ readelf --syms vipi | awk '$7 == 13 && $4 != "SECTION" { print $0 }'
        27: 0000000000400490     0 FUNC    LOCAL  DEFAULT   13 call_gmon_start
        32: 00000000004004b0     0 FUNC    LOCAL  DEFAULT   13 __do_global_dtors_aux
        35: 0000000000400520     0 FUNC    LOCAL  DEFAULT   13 frame_dummy
        40: 00000000004005f0     0 FUNC    LOCAL  DEFAULT   13 __do_global_ctors_aux
        46: 00000000004005e0     2 FUNC    GLOBAL DEFAULT   13 __libc_csu_fini
        56: 0000000000400550   137 FUNC    GLOBAL DEFAULT   13 __libc_csu_init
        58: 0000000000400464     0 FUNC    GLOBAL DEFAULT   13 _start
        61: 0000000000400440    34 FUNC    GLOBAL DEFAULT   13 main

where most of the function symbols displyed here, other than main, are part of the C run-time.

Now, there exist special section header table indicies (and, thus, section headers) for which no sections exist. These indicies do not necessarily get displayed in the readelf -S output. They include:

  • SHN_ABS A symbol defined relative to this section number has an absolute value and is not affected by relocation. A symbol corresponding to the name of the C source file is a typical candidate.

  • SHN_COMMON Symbols defined relative to this section are common symbols such as uninitialized and non-statically declared global variables defined in the C source file (e.g. int g;). COMMON means that the symbol belongs to a common block that has not yet been allocated. Common symbols are pertinent to relocatable object files. COMMON has its roots in UNIX hackery predating ELF and introduces an element of "non-determinism" during the merging of several relocatable object files that present more than one instance of a given common symbol. Consult --warn-common in ld(1) for the details. Generally, it is advisable to avoid instances of common symbols. Use gcc(1)'s -fno-common switch to prevent/detect occurences of these symbols.

  • SHN_UNDEF Symbols listed against this value are undefined, missing, etc. Typical symbols include externally defined symbols e.g. library symbols such as printf(3).

The elf(5) manpage describes the special header table indicies in detail. For illustration, consider:

$ readelf -s vipi.o | egrep '(ABS|COM|UND)'
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS vipi.c
        11: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM g
        12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __printf_chk

For the curious, check out the following page for an illustration of how various C function and variable declarations result in the generation of symbols with varying (compiler generated) section header table indicies (or in their symbol definition placement in the different sections of the generated ELF object file).

References to common and undefined symbols in relocatable object files are subject to symbol resolution (in addition to relocation) during the static linking phase with the link editor: each symbol reference is associated with exactly one symbol definition (an entry in one of the symbol tables of the input relocatable object files). On the other hand, references to undefined symbols at execution time trigger symbol lookup and binding by the program interpreter: each symbol reference is associated with the first symbol encountered in a lookup scope.

Recall that sections hold information for the linking phases. Depending on the ELF object file type, or even the method of build of an object file type (statically vs dynamically linked), certain sections (e.g. those holding relocation info, dynamic linking info, etc) may or may not be present. This topic is covered in more detail in the ELF Object File Types entry.

Finally, to view the disassembly of the contents of all sections in an object file, a variation of the objdump -D command may be used.

Symbol and String Tables

Function or variable definitions and references in C/C++ code translate to symbol definitions and references in the object file. Other symbols get generated during text-to-object file translation, e.g. section names in the ELF object file and name of the corresponding C source file, while others get included by the link editor, e.g. symbols that mark the end of various program segments (see end(3)) or even symbols that delimit programmer-defined sections2.

Most symbols in an object file are defined relative to an existing section. For example, and as illustrated in the previous section, function symbols (i.e. procedures) are defined relative to the text section that contains the function symbol definition (i.e. the procedure's code). Recall that symbols may be defined relative to special section header table indicies (e.g. SHN_ABS, SHN_COMMON, SHN_UNDEF, etc) for which no sections exist.

The compiler proper, cc1, processes the C source file input and exports symbols for the assembler, as(1), which, in turn, builds a symbol table in the relocatable object file output. In turn, the link editor generates one or more symbol tables in its output object file based on information in the symbol tables of its input object files. Symbol table information is involved in the location and relocation of an object file's symbolic definitions and references.

An object file can contain one or more symbol tables. For example, the dynamic linking symbol table, .dynsym, in dynamically linked object files holds a minimal set of symbols for the dynamic linking process at execution time by the program interpreter. On the other hand, the .symtab section is used primarily by the link editor and holds the complete symbol table (including the symbols entries in .dynsym).

While readelf --dyn-syms only displays the entries in .dynsym, the readelf --syms command displays the entries of all the symbol tables of an object file i.e. both .symtab and .dynsym. The fields in the output of these commands correspond to the members of the ELF32_Sym and ELF64_Sym structs in elf(5). Notably:

  • st_shndx Gives the index into the section header table; this indicates where the symbol is defined. Recall that there exist special section header table entries for which no sections exist.

  • st_value For relocatable files gives either the symbol's position offset from the beginning of its section (indicated by st_shndx), or holds alignment constraints if the symbol is marked with SHN_COMMON. For executable and shared object files, this value holds a virtual address.

  • st_info Specifies symbol type and binding. Entries of type SECTION exist primarily for relocation. Binding is discussed in symbol binding.

String tables are sections that hold null-terminated strings which represent symbols and section names. Recall that the section header table and various sections, e.g. .symtab and .dynsym, are comprised of fixed size entries. To account for the fact that section names and symbol names are not of fixed length, the fields of the structure members that are associated with these strings instead hold an index into the respective string table. Strings are, therefore, referenced as an index into the string table. The first and last index correspond to a null byte (\0) entry.

An object file can have more than one section of type SHT_STRTAB. For instance,

$ readelf -S vipi | grep STRTAB
    [ 6] .dynstr           STRTAB           0000000000400318  00000318
    [27] .shstrtab         STRTAB           0000000000000000  0000104a
    [29] .strtab           STRTAB           0000000000000000  00001ee0

where .dynstr holds names of symbols in .dynsym, shstrtab holds section names, and .strtab holds names of symbols in .symtab. Nevertheless, a string table may also contain miscelleanous info: strtab is a complete table that in addition to containing symbols used for the (static) linking process, it contains a few more symbols such as the names of the source file modules involved in the generation of the object file. This table can be viewed via readelf -p '.strtab'. The .dynstr section holds symbol names in .dynsym and other miscellaneous information e.g. the sonames of needed shared libraries. This table can be viewed via readelf -p '.dynstr'. Recall that string constants defined in the source code, e.g. "%d:Dunia, vipi?\n" in this example, are located in the .rodata section and do not appear in the string tables:

$ readelf -p '.rodata' vipi

String dump of section '.rodata':
    [     4]  %d:Dunia, vipi?.

Unlike .symtab, and hence .strtab, the .dynsym, and hence .dynstr, sections are not discarded by strip(1) since they are required for the dynamic linking process.

Segments

Executable and shared object files contain segments. A segment is a grouping of one or more sections in the object file. For example, the code segment includes the .text and .rodata sections while the data segment contains the .bss and .data sections.

A segment is described by a program header. The definition of the program header structure can be found in elf(5) as Elf32_Phdr and Elf64_Phdr. Every executable and shared object file contains a program header table (a.k.a segment header table) which is a contiguous array of program headers that describe the program's segments and other information that the system needs to prepare the program for execution.

The readelf -l command can be used to display an ELF object file's program headers:

$ readelf -l vipi

Elf file type is EXEC (Executable file)
Entry point 0x400464
There are 9 program headers, starting at offset 64

Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                   0x00000000000001f8 0x00000000000001f8  R E    8
    INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                   0x000000000000001c 0x000000000000001c  R      1
        [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
    LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                   0x000000000000071c 0x000000000000071c  R E    200000
    LOAD           0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                   0x00000000000001f8 0x0000000000000210  RW     200000
    DYNAMIC        0x0000000000000e50 0x0000000000600e50 0x0000000000600e50
                   0x0000000000000190 0x0000000000000190  RW     8
    NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                   0x0000000000000044 0x0000000000000044  R      4
    GNU_EH_FRAME   0x0000000000000650 0x0000000000400650 0x0000000000400650
                   0x000000000000002c 0x000000000000002c  R      4
    GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                   0x0000000000000000 0x0000000000000000  RW     8
    GNU_RELRO      0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                   0x00000000000001d8 0x00000000000001d8  R      1

 Section to Segment mapping:
    Segment Sections...
     00     
     01     .interp 
     02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
     03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
     04     .dynamic 
     05     .note.ABI-tag .note.gnu.build-id 
     06     .eh_frame_hdr 
     07     
     08     .ctors .dtors .jcr .dynamic .got

The p_type field is encoded via one of the PT_* constants (see elf(5)) and indicates what kind of segment the program header describes. For example:

  • PT_PHDR Indicates that the program header describes the location and size of the program header table itself. Presence of this entry means that the program header table is part of the memory image of the program.

  • PT_INTERP The p_vaddr and p_filesz values of this program header correspond to the location and size, respectively, of section .interp in the object file. Contents of this section yield the null-terminated pathname of the required program interpreter.

  • PT_LOAD Indicates that the program header describes a segment that is loadable into memory. The loadable code segment is always followed by the loadable data segment, both physically in the disk file and logically in the process' virtual address space. For executable object files, the link editor, ld(1), in complaince with the UNIX System V standard, always ensures that the loadable code segment always starts at a certain architecture dependent virtual load address. For example, 0x400000 for x86-64 and 0x08048000 for x86 IA-32.

  • PT_DYNAMIC Indicates that the program header table entry describes a segment that holds dynamic linking information.

  • PT_NOTE Indicates that the program header describes a segment that contains auxiliary information. This information can be extracted via readelf --notes and includes ABI info such as minimum Linux kernel version that can support execution of the userspace object file.

Note that some section entries appear in more than one segment mapping. Sections entries of the two loadable segments are mutually exclusive: Section entries for the PT_PHDR (00), PT_INTERP (01), PT_NOTE (05) and PT_GNU_EH_FRAME (06) segments also appear in the loadable code segment PT_LOAD (02). The loadable code segment contains sections holding machine instructions (e.g. .init, .plt, .text, and .fini) as well as sections holding certain read-only data including relocation information (.rela.dyn, rela.plt), dynamic linking information (.interp, .gnu.hash, .dynsym and dynstr, etc) and string constants in .rodata. The PT_DYNAMIC (04) and PT_GNU_RELRO (08) segments, on the other hand, are contained in the loadable data segment PT_LOAD (03). Also notice how the various segments have different memory attributes (alignment and rwx flags) e.g. the data sections in the PT_GNU_RELRO (08) segment get mapped into a separate read-only VMA (after load time symbol relocation of .got entries), unlike the rest of the read-write data sections of the loadable data segment4.

The ELF specification reserves p_type values from 0x60000000 (PT_LOOS) to 0x6fffffff (PT_HIOS) for OS-specific information (see <elf.h>). In this instance:

  • PT_GNU_EH_FRAME The GCC .eh_frame_hdr segment. This value is encoded as 0x6474e550.
  • PT_GNU_STACK Indicates stack executability and is encoded as 0x6474e551.
  • PT_GNU_RELRO Read-only after relocation and is encoded as 0x6474e552.

The ELF specification also reserves p_type values from 0x70000000 (PT_LOPROC) to 0x7fffffff (PT_HIPROC) for processor-specific information (see <elf.h>).

For executables, the p_vaddr field gives the virtual address the segment should be loaded at. This value may be relocatable for shared objects. Note that the value of p_paddr is meaningless in a system that employs virtual memory. The p_align field indicates the alignment requirements of the segment both in memory and in the file. The p_align value for loadable segments indicate their load time alignment requirements. Recall that the loadable segments are actually mutually exclusive supersets of (most of) the other segments. For these other segments, notice that they can start and end at arbitrary physical file offsets, p_offset, to save disk space. Their p_vaddr value equals their p_offset modulo p_align of their respective load segment. For the p_align values for these segments, values of zero and one mean no alignment is required. Otherwise, p_align should be a positive, integral power of two.

The p_memsz and p_filesz values yeild the number of bytes the segment occupies in memory and in the file, respectively. p_memsz can be larger that p_filesz if, say, the segment contains a .bss section (which doesn't occupy space in the object file)5. Finally, the p_flags field specifies the segment attributes: PF_R for a readable segment, PF_W for a writable segment, and PF_E for an executable segment.

Also See

  • See ELF Object File Types for a more detailed discussion on ELF object file types, and for the details of various sections holding relocation information and dynamic linking information.

  • See ELF Section & Segment and Linux VMA Mappings for a case study on the relationship between sections, segments and virtual memory areas in the process' virtual address space.

Resources and Further Reading

  • The readelf(5), elf(5), gcc(1) and ld(1) man pages.

  • ELF: Executable and Linkable Format, Potable Formats Specification, Version 1.1, Tool Interface Standards (TIS). (Available Online)

  • ELF: From The Programmer's Perspective, Hongjui Lu. (Available Online)

  • The ELF Object File Format: Introduction, Eric Youngdale

  • Using LD, The GNU linker (Available Online)

  • libelf by Example, Joseph Koshy

  • Books

    • Linkers and Loaders, John Levine

    • Computer Systems: A Programmer's Perspective, Randal E. Bryant, David R. O'Hallaron, 2011, Prentice Hall. Written with the student in mind, this is not (strictly) a book on GNU/Linux development but it includes a few chapters that methodically introduce the concepts behind program linking and virtual memory on a GNU/Linux system.

Footnotes

1. Either the static linking phase with the link editor, ld(1), or the dynamic linking phase with the program interpreter e.g. ld-linux.so [go back]

2. The GCC extension, __attribute__(section(NAME)), enables the programmer to define arbitrary sections in which to place code or data. This extension causes ld(1) to add symbols of the the form __start_NAME and __stop_NAME in its output object file that delimit the programmer-defined NAME section. See Placing Functions or Data in Arbitrary Sections [go back]

4. See ELF Sections & Segments and Linux VMA Mappings for an illustration. [go back]

5. See ELF Sections & Segments and Linux VMA Mappings for one such instance. [go back]