Author: Mugabi Siro
Category: Device Drivers
A jump-start tutorial on writing a User I/O (PCI) device driver. QEMU-PC's ivshmem virtual device is used here as the hardware platform.Tags: linux qemu pci
This entry builds upon the PCI device driver tutorial. The QEMU ivshmem virtual PCI device is also used here as the hardware platform. For QEMU version compatibility issues, check the QEMU Version Compatibility Note out.
User I/O (UIO) is a device driver model for Linux that enables most of the driver operations to be moved to userspace, with only a small kernel stub required for device initialization and low-level control. A few important implications of this approach include:
Floating-point operations and other facilities by userspace libraries/utilities can now be included in the device driver framework.
Bugs in the userspace part of the driver won't crash the kernel
The userspace part of the device driver need not be GPL licensed as it is not linked with the Linux kernel.
Updates in the userspace part of the driver can be done without the need for kernel recompilation.
A common class of device drivers that benefit from this model are PCI I/O cards for industrial or scientific/engineering prototyping. Typically, these (custom) device drivers are tightly coupled with the application framework which may include - among other things - numerical computations with floating point.
UIO is inspired by the fact that most (PCI) devices share similar runtime operations: reading and writing to a few registers and accessing the device's MMIO data regions. These parts of device driver logic can be implemented in userspace. Logic that handles device initialization, interrupt handling registration and configuration, and I/O enumeration is then implemented in the kernel stub.
Naturally, there are tradeoffs with this scheme of device driver implementation. Notable among these is the increased latency with respect to interrupt handling in userspace. Other considerations include memory locking the user mode part of the device driver to prevent it from being swapped out and the fact that it remains possible for a user space process (and hence the user mode part of the driver) to be killed at any moment by the kernel.
Also note that the UIO model is not universal. Devices that are already handled well by other kernel subsystems (e.g. networking, serial, USB, etc) are no candidates for a UIO driver. Generally, hardware that is ideally suited for an UIO driver satisfies the following criteria:
It has MMIO regions that can be memory mapped.
It usually generates interrupts
It does not fit squarely in one of the standard kernel subsystems.
Once the kernel
stub loads successfully, the corresponding UIO device
will be accessible from userspace via
Each UIO device is accessed through a device
file and several
sysfs attribute files. The
device file will be called
/dev/uio0 for the
/dev/uio1 for the second, etc.
Once the device file is successfully opened, its file
descriptor is used with
mmap(2) to memory map the device's MMIO region to the process' virtual
address space. A UIO device can make one or
more MMIO regions available for memory mapping.
From userspace, a given MMIO region is memory mapped
by specifying the respective
offset value in
mmap(2): to memory map MMIO region N, specify
N * getpagesize(2) for
Interrupts are handled by blocking on the
/dev/uioN file descriptor. When an interrupt occurs, the
blocking function returns. Blocking can be done via
read(2) or by way of
epoll(2) etc. According to
Documentation/DocBook/uio-howto.tmpl, when using
count parameter must be a
signed 32-bit integer. Any other data type specified for
read(2) to fail. The signed 32-bit
integer value read in is the interrupt count of
the device. In the best case scenario, successive interrupts will result in a difference
of one between the current and previously read
A difference of greater than one between two successive reads
indicates missed interrupts. Otherwise, if the current value is
less than the previous value, then an error occurred.
Note that part of the interrupt handling may still have to be performed within kernel space. In this case, the kernel stub should provide its own interrupt handler. This handler will automatically get invoked by the UIO framework upon device IRQ. This scenario applies, for example, if some operation has to be performed immediately upon reception of the hardware IRQ. For instance, if the kernel space interrupt handler needs to immediately read data from the device for later (userspace interrupt handler) processing, then this data could be first buffered in memory. This way, even if the userspace program misses (an) interrupt(s), loss of data can still be avoided. Another reason why proper (low-level) interrupt handling may also have to be done from kernel space is the fact that a userspace process can be terminated at any time by the OS. In other words, there exists the likelyhood that the userspace interrupt handler might leave the hardware in a state pending completion of low-level interrupt handling.
Certain corner-cases, such as devices with more
than one interrupt source and no separate IRQ mask and status registers, require the use of
to explicitly disable or enable interrupts. See
Documentation/DocBook/uio-howto.tmpl for more.
This structure is defined in
include/linux/uio_driver.h and ties the driver to the UIO
framework. The more commonly used members are
const char *name
The device name.
const char *version
The device driver version. This string appears in
struct uio_mem mem[MAX_UIO_MAPS]
This member is a list of mappable memory regions.
MAX_UIO_MAPS is defined as
5 and corresponds to the max number of Base Address Registers (BAR)s of a PCI device i.e. six. The
struct uio_mem structure is a description of a UIO memory region.
struct uio_port port[MAX_UIO_PORT_REGIONS]
This member is a list of port regions.
MAX_UIO_PORT_REGIONS is defined as
5 and corresponds to the max number of BARs a PCI device can have. The
struct uio_port structure is a description of a UIO port region.
This member is initialized to the IRQ value assigned to the device. If the device lacks a hardware generated interrupt, and still wishes to register an interrupt handler, set this member to
unsigned long irq_flags
irqreturn_t (*handler)(int irq, struct uio_ifo *dev_info);
The interrupt handler.
int (*mmap)(struct uio_info *info, struct vm_area_struct *vma);
If specified, then this
mmap method will used instead of the
mmap method implemented by the UIO framework.
int (*open)(struct uio_info *info, struct inode *inode);
Specify this to override the
open method provided by the UIO framework.
int (*release)(struct uio_info *info, struct inode *inode);
Usually specified when a custom
open method is specified.
int (*irqcontrol)(struct uio_info *info, s32 irq_on);
Specify this if the userspace code will write to
/dev/uioX to enable/disable interrupts from
0 disables interrupts while
1 enables them.
This structure is a description of a UIO memory region and is defined in
struct uio_mem need be initialized for every
MMIO region a device makes available for memory
mapping to userspace.
The QEMU VM instance used here will feature an ivshmem device that exports two MMIO regions: a "register memory" region for its control and status registers, and a data region that maps to a POSIX SHM region on the VM host.
Below is a description of
struct uio_mem most relevant to the device driver writer:
const char *name
Name of the memory region for identification
Address of the device's memory. The value specified here could be a logical, virtual or physical address.
unsigned long size
Size of the memory region
addr points to. For unused mappings this member must be set to
The type of memory
addr points to. Legal values are
UIO_MEM_PHYS ( a physical/bus address),
UIO_MEM_LOGICAL ( a kernel logical address;
UIO_MEM_VIRTUAL ( a kernel virtual address;
void __iomem *internal_addr
This is the
ioremap'd version of
addr. This address is meant for use from within the kernel stub and cannot be memory mapped to userspace.
Start up the ivshmem server1. Then boot up a QEMU VM instance with:
$ qemu-system-x86_64 ... \ -chardev socket,path=/tmp/ivshmem_socket,id=ivshmemid \ -device ivshmem,chardev=ivshmemid,size=1,msi=off
Login the VM and load the UIO device driver kernel stub:
$ sudo modprobe uio $ sudo insmod ne_ivshmem_uio_krn.ko
Perform a few preliminary checks:
$ cat /sys/class/uio/uio0/name ivshmem $ cat /sys/class/uio/uio0/version 0.1 $ ls -l /dev/uio0 crw------- 1 root root 250, 0 /dev/uio0
or create the device file if it does not exist:
$ cat /proc/devices | grep uio 250 uio $ sudo mknod -m 600 /dev/uio0 c 250 0
Each device driver UIO mapping has a corresponding
sysfs. The ivshmem kernel stub exports
$ cat /sys/class/uio/uio0/maps/map0/addr 0xfeb22000 $ cat /sys/class/uio/uio0/maps/map0/size 0x100 $ cat /sys/class/uio/uio0/maps/map1/addr 0xfea00000 $ cat /sys/class/uio/uio0/maps/map1/size 0x100000
There also exists a lsuio utility.
Now, compile ne_ivshmem_uio_usr.c, the userspace part implementation of the ivshmem UIO device driver framework, and execute it:
$ gcc -Wall -O2 ne_ivshmem_uio_usr.c -o ne_ivshmem_uio_usr $ sudo ./ne_ivshmem_uio_usr
Then, from the QEMU VM host environment, run2:
ne_ivshmem_send_qeventfd (pre-v2.5 QEMU)
ivshmem-client (post-v2.5 QEMU)
to send (periodic)
eventfd(2) notifications to the QEMU guest (raising IRQs on its ivshmem device). In the guest, the following output should now get displayed on the
ne_ivshmem_uio_usr launch terminal:
main:209:: IRQ count : 1 main:209:: IRQ count : 2 main:209:: IRQ count : 3 main:209:: IRQ count : 4 main:209:: IRQ count : 5 main:209:: IRQ count : 6 main:209:: IRQ count : 7
At this point, you may also (repeatedely) run
ne_ivshmem_shm_host_usr3 from a separate x-term on the host to view the POSIX SHM region update by the guest:
$ ./ne_ivshmem_shm_host_usr main:171:: read "IRQ count : 16" $ ./ne_ivshmem_shm_host_usr main:171:: read "IRQ count : 17" $ ./ne_ivshmem_shm_host_usr main:171:: read "IRQ count : 18" $ ./ne_ivshmem_shm_host_usr main:171:: read "IRQ count : 19"