Author: Mugabi Siro
Category: Device Drivers
This entry presents a few examples that illustrate (dangerous)
/dev/mem character driver is implemented in
remap_pfn_range1 to memory map a region of physical memory (e.g. system RAM, device memory, etc) to the calling process' address space.
mem is a character device file that is an image of the main memory of the computer. It may be used, for example, to examine (and even patch) the system. Byte addresses in mem are interpreted as physical memory addresses. References to nonexistent locations cause errors to be returned. Examining and patching is likely to lead to unexpected results when read-only or write-only bits are present. It is typically created by: mknod -m 660 /dev/mem c 1 1 chown root:kmem /dev/mem
In other words, once a userspace application (with sufficient privileges) successfully
/dev/mem, the new Virtual Memory Area (VMA) has a one-to-one correspondence with the specified physical memory region. Check this for a discussion on
mmap(2) and memory-mapped I/O.
These examples directly touch device memory, or (reserved) physical RAM to modify kernel data. It might be prudent to first perform the tests in a machine emulator environment such as QEMU2 or on some development machine/board.
At least for system RAM access,
CONFIG_STRICT_DEVMEM may have to be disabled.
Memory mapping via
/dev/mem makes it possible to access a PCI device's memory mapped regions even when its device driver is absent, or has not yet been loaded. This can be quite useful for preliminary tests/checks. Nevertheless, functionality with this mode of device access remains very limited.
The file upci.c is a library that contains functions to scan the PCI bus, find a particular PCI card, and read/write its I/O ports and MMIO regions. The ne_upci_rw .c program will use this library to illustrate memory mapping via
/dev/mem, and MMIO access on a PCI device.
The general flow in ne_upci_rw.c is:
upci_scan_bus() --> upci_find_device() --> upci_open_region() --> upci_read_N()/upci_write_N()
These functions are defined in upci.c:
upci_scan_bus() scans the PCI bus by reading
/proc/bus/pci/devices. For each device listed in this file, it initializes an entry in an internal table of data structures with the device's standard PCI configuration space info: DeviceID, VendorID, SubDeviceID, SubVendorID, address and size of regions associated with each available Base Address Register (BAR), etc.
upci_find_device() accepts user supplied (Sub)DeviceID and (Sub)VendorID info, and returns an integer descriptor, a.k.a device number.
upci_open_region() accepts the device number and a user supplied Base Address Register (BAR) index, a.k.a region number (starting from zero). It returns a new integer descriptor for the BAR, a.k.a data region. In
upci_open_region(), the user supplied region number is used as an index into the table that was prepared by
upci_scan_bus() to retrieve BAR information. Interesting fields in this table entry are the base address, size and type of I/O region (
UPCI_REG_MEM for a MMIO region). Along with a file descriptor obtained via
incr_mem_usage()), the base address and size values are used as the
length parameters, respectively, of the
At this point, a new VMA - associated with the MMIO region of the BAR - has been allocated within the virtual address space of the
ne_upci_rw instance. The
upci_write_N() interfaces (where
N is the size of a data unit) require the data region (i.e. the BAR's descriptor), and an offset (in number of bytes) into the mapping. The
upci_write_N() function additionally accepts a value to write into the specified region, while
upci_read_N() returns a copy of the data stored in the specified region.
The following tests were carried out on QEMU and should work with v1.x as well as v2.x releases. The InterVM Shared Memory (ivshmem) implementation is considered.
For purposes of this entry, either the
(available only as of QEMU versions later than v2.5), or the
"ivshmem" (legacy) device can be used:
$ qemu-system-x86_64 ... \ -device ivshmem,shm=ivshmem,size=1
$ qemu-system-x86_64 ... \ -object memory-backend-file,id=mb1,size=1M,share,mem-path=/dev/shm/ivshmem-plain \ -device ivshmem-plain,memdev=mb1,id=ivshmem-plain
Recall that upon QEMU boot, the ivshmem PCI device driver3 need not be loaded/present for the following tests.
ne_upci_rw in the guest environment:
$ make -f ne_upci_makefile
ne_upci_rw program usage, and obtain
$ ./ne_upci_rw -h ## to view program options and arguments $ lspci ## to view PCI device listing 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) ... 00:04.0 RAM memory: Red Hat, Inc Virtio Inter-VM shared memory $ lspci -s 00:04.0 -m -n -v Device: 00:04.0 Class: 0500 Vendor: 1af4 Device: 1110 SVendor: 1af4 SDevice: 1100 ...
Write operation: A string is written into the MMIO region associated with BAR2. Recall that the ivshmem device performs a POSIX SHM mapping on the host, and presents this mapping as its BAR2 MMIO region to the Linux guest.
$ sudo -s # DARRAY="68 117 110 105 97 44 118 105 112 105 63 13 0" # c=0; for i in $DARRAY > do > ./ne_upci_rw -D 0x1110 -V 0x1af4 -d 0x1100 -v 0x1af4 -r 2 -f u8 -o $c -a 1 -w $i > c=$((c+1)) > printf "%3d %3d\n" $i $c > done
Then on the host
$ hd /dev/shm/ivshmem 00000000 44 75 6e 69 61 2c 76 69 70 69 3f 0d 00 00 00 00 |Dunia,vipi?.....| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00100000
Read operation: Back in the guest environment, a read is performed on the device memory associated with BAR2. In this case, what was just written is returned.
# i=0; d=0; while [ $d == 0 ] > do > val=$(./ne_upci_rw -D 0x1110 -V 0x1af4 -d 0x1100 -v 0x1af4 -r 2 -f u8 -o $i -a 0) > if [ $val == "0x00" ] > then > d=1 > fi > echo -n "$val " > i=$((i+1)) > done; echo 0x44 0x75 0x6E 0x69 0x61 0x2C 0x76 0x69 0x70 0x69 0x3F 0x0D 0x00
The entry QEMU MMIO Memory Regions includes an example of accessing the memory mapped registers of a PCI device. The objective of that exercise was to obtain the function call trace taken by the QEMU vCPU thread function for the device's I/O callbacks.
In this example, a location within reserved physical memory belonging to the kernel's data segment is directly accessed and patched by a userspace program. References to Linux header files and sources, e.g.
drivers/char/mem.c, etc are with respect to the root of the kernel source tree. Based on Linux 3.x.
Providentially, while scribbling content for a related entry, I stumbled across the following sentence in Sreekrishnan Venkateswaran's seminal Essential Linux Device Drivers, Chapter 5, Section "Psuedo Char Drivers":
As an exercise, change the hostname of your system by accessing /dev/mem.
Now, there exists more than one way of doing this. The approach taken here
is simple: A tiny kernel module is used to obtain the physical address of the
nodename field of
struct new_utsname (defined in
/dev/mem is used to
mmap to this physical address.
The sources are available here. Simply run
make to compile the kernel stub. Build
$ gcc -Wall -O2 ne_devmem_hostname_usr.c
Obtain the physical address of the
nodename field of
Welcome to Buildroot buildroot login: root [root@buildroot ~]# hostname buildroot [root@buildroot ~]# echo 8 > /proc/sys/kernel/printk [root@buildroot ~]# insmod ne_devmem_hostname_krn.ko [ 22.319639] nodename: buildroot, phys addr: 0x1e102c5
which, indeed, lies within the range of physical memory reserved for the kernel data segment:
[root@buildroot ~]# less /proc/iomem ... 00100000-17ffcfff : System RAM 01000000-0182852c : Kernel code 0182852d-01eee43f : Kernel data 0203c000-02137fff : Kernel bss ...
So, proceeding with
/dev/mem to patch the contents of this physical address:
[root@buildroot ~]# ./a.out -l 0x1e102c5 main:126:: Patching paddr. 0x1e102c5, vaddr. 0xf77dc2c5 (nodename "buildroot") main:128:: with string "simsima" [root@buildroot ~]# hostname simsima [root@buildroot ~]# ./a.out -l 0x1e102c5 -s NinjaNinja main:126:: Patching paddr. 0x1e102c5, vaddr. 0xf77382c5 (nodename "simsima") main:128:: with string "NinjaNinja" [root@buildroot ~]# hostname NinjaNinja [root@buildroot ~]# exit logout Welcome to Buildroot NinjaNinja login: root [root@NinjaNinja ~]#