driver-know-hows

device driver related stuff

View on GitHub

Kernel Driver Development - Troubleshooting Guide

Common Problems and Solutions

1. Module Won’t Load

Error: “Invalid module format”

# Check kernel version mismatch
uname -r
modinfo module.ko | grep vermagic

# Solution: Rebuild against correct kernel headers
make clean
make

Error: “Unknown symbol”

# Missing dependency
dmesg | grep "Unknown symbol"

# Solution: Load dependency first or check exports
modprobe dependency_module
# Or rebuild with correct symbols

Error: “Operation not permitted”

# Need root access
# Solution:
sudo insmod module.ko

Error: “Module already loaded”

# Module with same name exists
lsmod | grep module

# Solution: Unload first
sudo rmmod module_name

2. Compilation Errors

Error: “linux/module.h: No such file”

# Missing kernel headers
# Solution:
sudo apt-get install linux-headers-$(uname -r)

Error: “implicit declaration of function”

// Missing include or wrong kernel version
// Solution: Check kernel documentation for correct header
// Example:
#include <linux/delay.h>  // For msleep
#include <linux/slab.h>   // For kmalloc

Warning: “ISO C90 forbids mixed declarations”

// Declaring variables after statements
// Solution: Move declarations to top of block
void bad_func(void) {
    do_something();
    int x = 5;  // BAD
}

void good_func(void) {
    int x;  // GOOD
    do_something();
    x = 5;
}

3. Runtime Issues

System Freezes/Hangs

Causes:

Recovery:

# Try SysRq keys
Alt + SysRq + b  # Force reboot

# Or from SSH:
echo b > /proc/sysrq-trigger

Prevention:

// Keep spinlock sections SHORT
spin_lock(&lock);
// Only fast operations here
spin_unlock(&lock);

// Use lockdep
CONFIG_PROVE_LOCKING=y

Kernel Oops/Panic

Analyze crash:

# View oops in dmesg
dmesg

# Decode with symbols
./scripts/decode_stacktrace.sh vmlinux < oops.txt

# Or use crash utility
crash vmlinux vmcore

Common causes:

Memory Leaks

Detection:

# Enable kmemleak
CONFIG_DEBUG_KMEMLEAK=y

# Trigger scan
echo scan > /sys/kernel/debug/kmemleak

# View leaks
cat /sys/kernel/debug/kmemleak

Prevention:

// Always free what you allocate
static int __init my_init(void)
{
    void *buf = kmalloc(1024, GFP_KERNEL);
    if (!buf)
        return -ENOMEM;
    
    // ... use buf ...
    
    kfree(buf);  // DON'T FORGET!
    return 0;
}

4. Device Issues

Device Node Not Created

Check:

# Is module loaded?
lsmod | grep module

# Check dmesg for errors
dmesg | tail

# Is udev running?
systemctl status systemd-udevd

# Manual creation as workaround:
sudo mknod /dev/mydev c MAJOR MINOR

Fix:

// Use device_create for auto-creation
dev_class = class_create(THIS_MODULE, "myclass");
device_create(dev_class, NULL, dev_num, NULL, "mydev");

Permission Denied on Device Access

# Check permissions
ls -l /dev/mydev

# Fix permissions
sudo chmod 666 /dev/mydev

# Or add udev rule
echo 'KERNEL=="mydev", MODE="0666"' | \
    sudo tee /etc/udev/rules.d/99-mydev.rules
sudo udevadm control --reload

5. Race Conditions

Symptoms

Detection:

# Enable lock debugging
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_PROVE_LOCKING=y

# Run with lockdep
dmesg | grep -i lockdep

Fix:

// Protect shared data
static spinlock_t my_lock;
static int shared_data;

void update_data(int new_val)
{
    spin_lock(&my_lock);
    shared_data = new_val;  // Protected
    spin_unlock(&my_lock);
}

6. Performance Issues

High CPU Usage

Profile:

# Use perf
perf record -g
perf report

# Check /proc/interrupts
cat /proc/interrupts

# Check context switches
vmstat 1

Common causes:

Fix:

// BAD: Busy waiting
while (!ready)
    cpu_relax();  // Wastes CPU

// GOOD: Sleep
wait_event_interruptible(wq, ready);

Memory Usage

Check:

# Slab info
cat /proc/slabinfo | grep module

# Overall memory
free -h
cat /proc/meminfo

7. Debugging Techniques

Enable Debug Messages

# Dynamic debug
echo 'file driver.c +p' > /sys/kernel/debug/dynamic_debug/control

# Or at boot
kernel_param.dyndbg='+p'

Add Trace Points

#include <linux/ftrace.h>

void my_function(void)
{
    trace_printk("my_function called\n");
}

// View with:
// cat /sys/kernel/debug/tracing/trace

Use KGDB

# Setup target
kgdboc=ttyS0,115200 kgdbwait

# On host
gdb vmlinux
(gdb) target remote /dev/ttyS0
(gdb) break my_driver_init
(gdb) continue

Checklist Before Loading Module

Emergency Recovery

System Unresponsive

  1. Try switching to console: Ctrl+Alt+F1
  2. Use SysRq keys: Alt+SysRq+…
    • r: Switch keyboard to raw mode
    • e: Send SIGTERM to all processes
    • i: Send SIGKILL to all processes
    • s: Sync filesystems
    • u: Remount read-only
    • b: Force reboot
  3. From SSH: echo b > /proc/sysrq-trigger

Module Won’t Unload

# Check module usage
lsmod | grep module

# Force remove (dangerous!)
sudo rmmod -f module

# Check what's holding it
lsof | grep module

Corrupted System

# Boot into recovery mode
# From GRUB, select "Advanced options" → "Recovery mode"

# Remove problematic module
rm /lib/modules/$(uname -r)/extra/module.ko
depmod -a

# Reboot
reboot

Getting Help

Gather Information

# Kernel version
uname -a

# Module info
modinfo module.ko

# dmesg output
dmesg > dmesg.txt

# System info
lspci -v > lspci.txt
lsusb -v > lsusb.txt

Useful Resources

Prevention Tips

  1. Test in VMs first - Always!
  2. Use version control - Track changes
  3. Enable all debug options - During development
  4. Write tests - Create test programs
  5. Code reviews - Have others review
  6. Read similar drivers - Learn from examples
  7. Start simple - Add complexity gradually

Quick Reference: Error Codes

Code Meaning Typical Cause
-ENOMEM Out of memory kmalloc failed
-EINVAL Invalid argument Bad parameter
-EBUSY Device busy Already open/in use
-EFAULT Bad address copy_to/from_user failed
-EIO I/O error Hardware problem
-ENODEV No such device Device not found
-EAGAIN Try again Would block in non-blocking mode
-EINTR Interrupted Signal received

Remember: Kernel bugs can crash the system. Always test carefully!