Author: Mugabi Siro

Category: Device Drivers

Summary:

A jump-start tutorial on writing a User I/O (PCI) device driver. QEMU-PC's ivshmem virtual device is used here as the hardware platform.

Tags: linux qemu pci

Table Of Contents

Preliminaries

This entry builds upon the PCI device driver tutorial. The QEMU ivshmem virtual PCI device is also used here as the hardware platform. For QEMU version compatibility issues, check the QEMU Version Compatibility Note out.

UIO Background

User I/O (UIO) is a device driver model for Linux that enables most of the driver operations to be moved to userspace, with only a small kernel stub required for device initialization and low-level control. A few important implications of this approach include:

  • Floating-point operations and other facilities by userspace libraries/utilities can now be included in the device driver framework.

  • Bugs in the userspace part of the driver won't crash the kernel

  • The userspace part of the device driver need not be GPL licensed as it is not linked with the Linux kernel.

  • Updates in the userspace part of the driver can be done without the need for kernel recompilation.

A common class of device drivers that benefit from this model are PCI I/O cards for industrial or scientific/engineering prototyping. Typically, these (custom) device drivers are tightly coupled with the application framework which may include - among other things - numerical computations with floating point.

UIO is inspired by the fact that most (PCI) devices share similar runtime operations: reading and writing to a few registers and accessing the device's MMIO data regions. These parts of device driver logic can be implemented in userspace. Logic that handles device initialization, interrupt handling registration and configuration, and I/O enumeration is then implemented in the kernel stub.

Naturally, there are tradeoffs with this scheme of device driver implementation. Notable among these is the increased latency with respect to interrupt handling in userspace. Other considerations include memory locking the user mode part of the device driver to prevent it from being swapped out and the fact that it remains possible for a user space process (and hence the user mode part of the driver) to be killed at any moment by the kernel.

Also note that the UIO model is not universal. Devices that are already handled well by other kernel subsystems (e.g. networking, serial, USB, etc) are no candidates for a UIO driver. Generally, hardware that is ideally suited for an UIO driver satisfies the following criteria:

  • It has MMIO regions that can be memory mapped.

  • It usually generates interrupts

  • It does not fit squarely in one of the standard kernel subsystems.

Once the kernel stub loads successfully, the corresponding UIO device will be accessible from userspace via /dev/uioN.

UIO Mechanism

Each UIO device is accessed through a device file and several sysfs attribute files. The device file will be called /dev/uio0 for the first device, /dev/uio1 for the second, etc.

Memory Mapping device MMIO regions to userspace

Once the device file is successfully opened, its file descriptor is used with mmap(2) to memory map the device's MMIO region to the process' virtual address space. A UIO device can make one or more MMIO regions available for memory mapping. From userspace, a given MMIO region is memory mapped by specifying the respective offset value in mmap(2): to memory map MMIO region N, specify N * getpagesize(2) for offset.

Interrupt handling in userspace

Interrupts are handled by blocking on the open /dev/uioN file descriptor. When an interrupt occurs, the blocking function returns. Blocking can be done via read(2) or by way of select(2) , poll(2), epoll(2) etc. According to Documentation/DocBook/uio-howto.tmpl, when using read(2), the count parameter must be a signed 32-bit integer. Any other data type specified for count causes read(2) to fail. The signed 32-bit integer value read in is the interrupt count of the device. In the best case scenario, successive interrupts will result in a difference of one between the current and previously read count value. A difference of greater than one between two successive reads indicates missed interrupts. Otherwise, if the current value is less than the previous value, then an error occurred.

Note that part of the interrupt handling may still have to be performed within kernel space. In this case, the kernel stub should provide its own interrupt handler. This handler will automatically get invoked by the UIO framework upon device IRQ. This scenario applies, for example, if some operation has to be performed immediately upon reception of the hardware IRQ. For instance, if the kernel space interrupt handler needs to immediately read data from the device for later (userspace interrupt handler) processing, then this data could be first buffered in memory. This way, even if the userspace program misses (an) interrupt(s), loss of data can still be avoided. Another reason why proper (low-level) interrupt handling may also have to be done from kernel space is the fact that a userspace process can be terminated at any time by the OS. In other words, there exists the likelyhood that the userspace interrupt handler might leave the hardware in a state pending completion of low-level interrupt handling.

Certain corner-cases, such as devices with more than one interrupt source and no separate IRQ mask and status registers, require the use of write(2) to explicitly disable or enable interrupts. See Documentation/DocBook/uio-howto.tmpl for more.

Implementing the UIO device driver kernel stub

struct uio_info

This structure is defined in include/linux/uio_driver.h and ties the driver to the UIO framework. The more commonly used members are described here:

  • const char *name

    The device name.

  • const char *version

    The device driver version. This string appears in /sys/class/uio/uioX/version.

  • struct uio_mem mem[MAX_UIO_MAPS]

    This member is a list of mappable memory regions. MAX_UIO_MAPS is defined as 5 and corresponds to the max number of Base Address Registers (BAR)s of a PCI device i.e. six. The struct uio_mem structure is a description of a UIO memory region.

  • struct uio_port port[MAX_UIO_PORT_REGIONS]

    This member is a list of port regions. MAX_UIO_PORT_REGIONS is defined as 5 and corresponds to the max number of BARs a PCI device can have. The struct uio_port structure is a description of a UIO port region.

  • long irq

    This member is initialized to the IRQ value assigned to the device. If the device lacks a hardware generated interrupt, and still wishes to register an interrupt handler, set this member to UIO_IRQ_CUSTOM.

  • unsigned long irq_flags

    flags for request_irq().

  • irqreturn_t (*handler)(int irq, struct uio_ifo *dev_info);

    The interrupt handler.

  • int (*mmap)(struct uio_info *info, struct vm_area_struct *vma);

    If specified, then this mmap method will used instead of the mmap method implemented by the UIO framework.

  • int (*open)(struct uio_info *info, struct inode *inode);

    Specify this to override the open method provided by the UIO framework.

  • int (*release)(struct uio_info *info, struct inode *inode);

    Usually specified when a custom open method is specified.

  • int (*irqcontrol)(struct uio_info *info, s32 irq_on);

    Specify this if the userspace code will write to /dev/uioX to enable/disable interrupts from userspace. Setting irq_on to 0 disables interrupts while 1 enables them.

struct uio_mem

This structure is a description of a UIO memory region and is defined in <linux/uio_driver.h>. A struct uio_mem need be initialized for every MMIO region a device makes available for memory mapping to userspace.

The QEMU VM instance used here will feature an ivshmem device that exports two MMIO regions: a "register memory" region for its control and status registers, and a data region that maps to a POSIX SHM region on the VM host.

Below is a description of members of struct uio_mem most relevant to the device driver writer:

  • const char *name

    Name of the memory region for identification

  • phys_addr_t addr

    Address of the device's memory. The value specified here could be a logical, virtual or physical address.

  • unsigned long size

    Size of the memory region addr points to. For unused mappings this member must be set to 0.

  • int memtype

    The type of memory addr points to. Legal values are UIO_MEM_PHYS ( a physical/bus address), UIO_MEM_LOGICAL ( a kernel logical address; ZONE_NORMAL), UIO_MEM_VIRTUAL ( a kernel virtual address; ZONE_HIGHMEM) and UIO_MEM_NONE.

  • void __iomem *internal_addr

    This is the ioremap'd version of addr. This address is meant for use from within the kernel stub and cannot be memory mapped to userspace.

Demo Programs

The file ne_ivshmem_uio_krn.c is a skeleton version of the original uio_ivshmem.c by Cam Macdonell.

UIO Test

Start up the ivshmem server1. Then boot up a QEMU VM instance with:

$ qemu-system-x86_64 ... \
    -chardev socket,path=/tmp/ivshmem_socket,id=ivshmemid \
    -device ivshmem,chardev=ivshmemid,size=1,msi=off

Login the VM and load the UIO device driver kernel stub:

$ sudo modprobe uio
$ sudo insmod ne_ivshmem_uio_krn.ko

Perform a few preliminary checks:

$ cat /sys/class/uio/uio0/name 
ivshmem

$ cat /sys/class/uio/uio0/version 
0.1

$ ls -l /dev/uio0
crw------- 1 root root 250, 0 /dev/uio0

or create the device file if it does not exist:

$ cat /proc/devices | grep uio
250 uio

$ sudo mknod -m 600 /dev/uio0 c 250 0

Each device driver UIO mapping has a corresponding entry in sysfs. The ivshmem kernel stub exports two:

$ cat /sys/class/uio/uio0/maps/map0/addr 
0xfeb22000

$ cat /sys/class/uio/uio0/maps/map0/size
0x100

$ cat /sys/class/uio/uio0/maps/map1/addr 
0xfea00000

$ cat /sys/class/uio/uio0/maps/map1/size
0x100000

There also exists a lsuio utility.

Now, compile ne_ivshmem_uio_usr.c, the userspace part implementation of the ivshmem UIO device driver framework, and execute it:

$ gcc -Wall -O2 ne_ivshmem_uio_usr.c -o ne_ivshmem_uio_usr

$ sudo ./ne_ivshmem_uio_usr

Then, from the QEMU VM host environment, run2:

  • ne_ivshmem_send_qeventfd (pre-v2.5 QEMU)

  • ivshmem-client (post-v2.5 QEMU)

to send (periodic) eventfd(2) notifications to the QEMU guest (raising IRQs on its ivshmem device). In the guest, the following output should now get displayed on the ne_ivshmem_uio_usr launch terminal:

main:209:: IRQ count : 1
main:209:: IRQ count : 2
main:209:: IRQ count : 3
main:209:: IRQ count : 4
main:209:: IRQ count : 5
main:209:: IRQ count : 6
main:209:: IRQ count : 7

At this point, you may also (repeatedely) run ne_ivshmem_shm_host_usr3 from a separate x-term on the host to view the POSIX SHM region update by the guest:

$ ./ne_ivshmem_shm_host_usr
main:171:: read "IRQ count : 16"

$ ./ne_ivshmem_shm_host_usr
main:171:: read "IRQ count : 17"

$ ./ne_ivshmem_shm_host_usr
main:171:: read "IRQ count : 18"

$ ./ne_ivshmem_shm_host_usr
main:171:: read "IRQ count : 19"

Also See

Resources

  • Linux 3.x sources and the official UIO HOWTO by Hans-Jurgen Koch (Documentation/DocBook/uio-howto.tmpl).
  • Cam Macdonell's Nahanni code base.

Footnotes