Author: Mugabi Siro

Category: Device Drivers

Summary:

A tutorial on using Ftrace for identifying the kernel control path taken from a device driver's Interrupt Service Routine (ISR) to the userland process. A blocking read(2) on stdin (local keyboard) is explored here as a case study. Linux 3.x used.

Tags: linux ftrace debugging/tracing

Preliminaries

An introduction to the function call tracing capabilities of the Ftrace framework can be found here.

The following program, ne_ftrace_read_char.c, performs some Ftrace initialization then waits for user keyboard interaction by blocking on stdin via read(2). Hitting RETURN will then cause the program to run to completion.

Note that in order to obtain a trace of the keyboard interrupt servicing, this program must be run from a local terminal i.e. a text /dev/ttyN or an x-term. Since serial and remote terminals do not use the local keyboard for stdin, no keyboard ISR (for this program instance) will be recorded by Ftrace.

Function filtering

To help produce a readable trace, some function filtering will be applied here. Since what is of most interest is tracing the control path from the ISR to the blocking sys_read, the following filtering command is applied:

$ sudo sh -c "echo i8042_interrupt > /sys/kernel/debug/tracing/set_graph_function"

The set_graph_function control file sets a "trigger" function where tracing should start with the function graph tracer. The machine on which this trace was performed (an aging quad i3-2350M/4GB laptop) had an i8042 keyboard controller:

$ less /proc/interrupts

              CPU0       CPU1       CPU2       CPU3       
     0:        486          0          0          0   IO-APIC-edge      timer
     1:         24          0     141378          0   IO-APIC-edge      i8042
     8:          1          0          0          0   IO-APIC-edge      rtc0
     9:       9615          0          0          0   IO-APIC-fasteoi   acpi
    12:     528355          0          0          0   IO-APIC-edge      i8042
    [...]               [...]               [...]               [...]


$ cat /sys/devices/platform/i8042/serio0/description 
i8042 KBD port

$ cat /sys/devices/platform/i8042/serio0/uevent 
DRIVER=atkbd
[...]

$ cat /sys/devices/platform/i8042/serio1/description 
i8042 AUX port

$ cat /sys/devices/platform/i8042/serio1/uevent 
DRIVER=psmouse
[...]

Specify the ISR for the keyboard controller of your machine. Alternatively, since the QEMU PC1 machine emulator also emulates this keyboard controller, you may run these tests on it.

Finally, to give context to this ISR-to-process trace, function tracing for the path taken by sys_read to the point where it blocks in the TTY line discipline layer, waiting for user keyboard interaction, is also included:

$ sudo sh -c "echo sys_read >> /sys/kernel/debug/tracing/set_graph_function"

When done with the Ftrace examples presented in this guide, do not forget restore set_graph_function to its default setting:

$ sudo sh -c "echo > /sys/kernel/debug/tracing/set_graph_function"

$ sudo cat /sys/kernel/debug/tracing/set_graph_function
#### all functions enabled ####

CPU affinity for processes

To further enhance readability of the trace, hard CPU affinity may be included in the mix. Generally, the scheduler strives to maintain CPU affinity for a process but may reschedule it on another CPU if, say, the original CPU is busy (i.e. soft CPU affinity). This can introduce an element of confusion if filtering out trace events based on the CPU number. One native Ftrace solution to the CPU affinity problem is function tracing based on a process' PID. However, since tracing the local keyboard interrupt servicing is also of interest here, then this approach cannot be taken i.e. the keyboard interrupt event will be excluded from the trace if PID based function tracing is configured.

A straightforward and non intrusive way of effecting CPU affinity for processes is by way of taskset(1). This utility employs sched_setaffinty(2) to enforce a process' CPU affinity. Alternatively, a direct call to sched_setaffinity(2) could be included in a program's sources.

But note that even with the function filtering and CPU affinity syscalls combination, any sys_read events by any other process intermittently scheduled on the same CPU while tracing is in effect will also get recorded in the trace output, thereby introducing another source of confusion. This problem primarily depends on the system's load. To make matters worse, given heavy system load, and since this particular program waits for user keyboard interaction, the interval between the sys_read block in the TTY line discipline layer and key press can easily result in Ftrace ring buffer overflows. Increasing the size of the ring buffer - in this case - might only translate to more strain on the brain (more info to audit) rather than help alleviate matters. One solution to this problem could be to run a very a basic, lightweight root filesystem configuration e.g. Busybox or Buildroot. Alternatively, the isolcpus kernel boot option could be employed in combination with the CPU affinity syscalls so that the process gets scheduled exclusively on an isolated CPU. This second approach will be considered here.

Test run

The system was rebooted with the following isolcpus option2:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.2.0-35-generic root=UUID=57[...]bdb ro quiet splash vt.handoff=7 isolcpus=3

The following function graph filtering was then specified:

$ sudo sh -c "echo i8042_interrupt > /sys/kernel/debug/tracing/set_graph_function"

$ sudo sh -c "echo sys_read >> /sys/kernel/debug/tracing/set_graph_function"

The program was then compiled and run with hard CPU affinity via taskset(1) on the isolated CPU:

$ gcc -O2 -Wall -Wno-unused-result ne_ftrace_read_char.c -o ne_ftrace_read_char

$ sudo taskset 0x8 ./ne_ftrace_read_char
kernel.ftrace_enabled = 1
main:137:: Strike ENTER to continue...

    <hit-RETURN>

This test run produced a trace of the following size:

$ wc -l ftrace_output.txt 
4172 ftrace_output.txt

Manually editing the trace to exclude all process events on a logical CPU other than CPU 3 resulted in the following trace.

Notice that the keyboard interrupt servicing occurred on a different logical CPU (i.e. CPU 0). Linux also provides mechanisms to effect IRQ to CPU affinity. Since Ftrace features per CPU traces under /sys/kernel/debug/tracing/per_cpu, in addition to the general trace in /sys/kernel/debug/tracing/trace, configuring IRQ to CPU affinity can also be exploited to provide a clean trace that requires no need for manual editing.

IRQ to CPU Affinity

This feature of the kernel enables one to "hook" an IRQ to one or a set of CPUs. It can also exclude a CPU, or a set of CPUs, from handling an IRQ. Using this feature, in combination with the isolcpus option and the CPU affinity syscalls, is exploited here such that both process execution and keyboard IRQ servicing occur on one (isolated) CPU. This way, a clean per_cpu trace is obtained for the entire trip, eliminating the need for manual editing.

Test run

The initial state of the system was:

$ less /proc/interrupts 
             CPU0       CPU1       CPU2       CPU3       
    0:         55          0          0          0   IO-APIC-edge      timer
    1:      19084          0          0          0   IO-APIC-edge      i8042
[...]

Applying the following IRQ to CPU affinity mask:

$ sudo sh -c "echo 8 > /proc/irq/1/smp_affinity"

to correspond to the isolcpus setting:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.2.0-35-generic root=UUID=57[...]bdb ro quiet splash vt.handoff=7 isolcpus=3

and checking:

$ less /proc/interrupts
             CPU0       CPU1       CPU2       CPU3       
    0:         55          0          0          0   IO-APIC-edge      timer
    1:      19259          0          0        143   IO-APIC-edge      i8042
  [...]

then typing a few things, including this very line on the VIM editor, and re-checking:

$ less /proc/interrupts
             CPU0       CPU1       CPU2       CPU3       
    0:         55          0          0          0   IO-APIC-edge      timer
    1:      19259          0          0        305   IO-APIC-edge      i8042
  [...]

Specifying Ftrace function filtering:

$ sudo sh -c "echo i8042_interrupt > /sys/kernel/debug/tracing/set_graph_function"

$ sudo sh -c "echo sys_read >> /sys/kernel/debug/tracing/set_graph_function"

and recompiling the program with:

$ gcc -Wall -O2 -Wno-unused-result ne_ftrace_read_char.c -o ne_ftrace_read_char -DKBD_IRQ_CPU3_AFFINITY

then executing:

$ sudo taskset 0x8 ./ne_ftrace_read_char
kernel.ftrace_enabled = 1
main:137:: Strike ENTER to continue...

        <hit-RETURN>

resulted in the following trace in ./ftrace_cpu3_output.txt:

$ wc -l ftrace_cpu3_output.txt 
482 ftrace_cpu3_output.txt

which can be viewed here.

Resources and Further Reading

There exist a spectrum of options and frameworks for kernel hacking. Identifying and using them in the right combination can leverage the learning curve or hacking process. This entry only touched on a few features of the respective options and frameworks.

Also See

Footnotes

1. See QEMU Intro [go back]

2. See Linux CPU Affinity [go back]