Kernel Driver Development - Troubleshooting Guide
Common Problems and Solutions
1. Module Won’t Load
Error: “Invalid module format”
# Check kernel version mismatch
uname -r
modinfo module.ko | grep vermagic
# Solution: Rebuild against correct kernel headers
make clean
make
Error: “Unknown symbol”
# Missing dependency
dmesg | grep "Unknown symbol"
# Solution: Load dependency first or check exports
modprobe dependency_module
# Or rebuild with correct symbols
Error: “Operation not permitted”
# Need root access
# Solution:
sudo insmod module.ko
Error: “Module already loaded”
# Module with same name exists
lsmod | grep module
# Solution: Unload first
sudo rmmod module_name
2. Compilation Errors
Error: “linux/module.h: No such file”
# Missing kernel headers
# Solution:
sudo apt-get install linux-headers-$(uname -r)
Error: “implicit declaration of function”
// Missing include or wrong kernel version
// Solution: Check kernel documentation for correct header
// Example:
#include <linux/delay.h> // For msleep
#include <linux/slab.h> // For kmalloc
Warning: “ISO C90 forbids mixed declarations”
// Declaring variables after statements
// Solution: Move declarations to top of block
void bad_func(void) {
do_something();
int x = 5; // BAD
}
void good_func(void) {
int x; // GOOD
do_something();
x = 5;
}
3. Runtime Issues
System Freezes/Hangs
Causes:
- Holding spinlock too long
- Deadlock
- Infinite loop in critical section
- Sleeping while holding spinlock
Recovery:
# Try SysRq keys
Alt + SysRq + b # Force reboot
# Or from SSH:
echo b > /proc/sysrq-trigger
Prevention:
// Keep spinlock sections SHORT
spin_lock(&lock);
// Only fast operations here
spin_unlock(&lock);
// Use lockdep
CONFIG_PROVE_LOCKING=y
Kernel Oops/Panic
Analyze crash:
# View oops in dmesg
dmesg
# Decode with symbols
./scripts/decode_stacktrace.sh vmlinux < oops.txt
# Or use crash utility
crash vmlinux vmcore
Common causes:
- NULL pointer dereference
- Use after free
- Buffer overflow
- Invalid memory access
Memory Leaks
Detection:
# Enable kmemleak
CONFIG_DEBUG_KMEMLEAK=y
# Trigger scan
echo scan > /sys/kernel/debug/kmemleak
# View leaks
cat /sys/kernel/debug/kmemleak
Prevention:
// Always free what you allocate
static int __init my_init(void)
{
void *buf = kmalloc(1024, GFP_KERNEL);
if (!buf)
return -ENOMEM;
// ... use buf ...
kfree(buf); // DON'T FORGET!
return 0;
}
4. Device Issues
Device Node Not Created
Check:
# Is module loaded?
lsmod | grep module
# Check dmesg for errors
dmesg | tail
# Is udev running?
systemctl status systemd-udevd
# Manual creation as workaround:
sudo mknod /dev/mydev c MAJOR MINOR
Fix:
// Use device_create for auto-creation
dev_class = class_create(THIS_MODULE, "myclass");
device_create(dev_class, NULL, dev_num, NULL, "mydev");
Permission Denied on Device Access
# Check permissions
ls -l /dev/mydev
# Fix permissions
sudo chmod 666 /dev/mydev
# Or add udev rule
echo 'KERNEL=="mydev", MODE="0666"' | \
sudo tee /etc/udev/rules.d/99-mydev.rules
sudo udevadm control --reload
5. Race Conditions
Symptoms
- Intermittent crashes
- Corrupted data
- Inconsistent behavior
- Hard to reproduce
Detection:
# Enable lock debugging
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_PROVE_LOCKING=y
# Run with lockdep
dmesg | grep -i lockdep
Fix:
// Protect shared data
static spinlock_t my_lock;
static int shared_data;
void update_data(int new_val)
{
spin_lock(&my_lock);
shared_data = new_val; // Protected
spin_unlock(&my_lock);
}
6. Performance Issues
High CPU Usage
Profile:
# Use perf
perf record -g
perf report
# Check /proc/interrupts
cat /proc/interrupts
# Check context switches
vmstat 1
Common causes:
- Busy-wait loops
- Too frequent interrupts
- Inefficient algorithms
Fix:
// BAD: Busy waiting
while (!ready)
cpu_relax(); // Wastes CPU
// GOOD: Sleep
wait_event_interruptible(wq, ready);
Memory Usage
Check:
# Slab info
cat /proc/slabinfo | grep module
# Overall memory
free -h
cat /proc/meminfo
7. Debugging Techniques
Enable Debug Messages
# Dynamic debug
echo 'file driver.c +p' > /sys/kernel/debug/dynamic_debug/control
# Or at boot
kernel_param.dyndbg='+p'
Add Trace Points
#include <linux/ftrace.h>
void my_function(void)
{
trace_printk("my_function called\n");
}
// View with:
// cat /sys/kernel/debug/tracing/trace
Use KGDB
# Setup target
kgdboc=ttyS0,115200 kgdbwait
# On host
gdb vmlinux
(gdb) target remote /dev/ttyS0
(gdb) break my_driver_init
(gdb) continue
Checklist Before Loading Module
- Compiled without warnings (
-Wall -Werror) - Tested in VM first
- Checked with sparse:
make C=1 - Ran checkpatch:
checkpatch.pl --file driver.c - Added proper error handling
- Tested cleanup path
- Reviewed for memory leaks
- Protected shared data with locks
- Documented module parameters
Emergency Recovery
System Unresponsive
- Try switching to console: Ctrl+Alt+F1
- Use SysRq keys: Alt+SysRq+…
r: Switch keyboard to raw modee: Send SIGTERM to all processesi: Send SIGKILL to all processess: Sync filesystemsu: Remount read-onlyb: Force reboot
- From SSH:
echo b > /proc/sysrq-trigger
Module Won’t Unload
# Check module usage
lsmod | grep module
# Force remove (dangerous!)
sudo rmmod -f module
# Check what's holding it
lsof | grep module
Corrupted System
# Boot into recovery mode
# From GRUB, select "Advanced options" → "Recovery mode"
# Remove problematic module
rm /lib/modules/$(uname -r)/extra/module.ko
depmod -a
# Reboot
reboot
Getting Help
Gather Information
# Kernel version
uname -a
# Module info
modinfo module.ko
# dmesg output
dmesg > dmesg.txt
# System info
lspci -v > lspci.txt
lsusb -v > lsusb.txt
Useful Resources
- Kernel mailing list: linux-kernel@vger.kernel.org
- Stack Overflow: [linux-kernel] tag
- Kernel documentation: https://www.kernel.org/doc/
- LWN.net articles: https://lwn.net/Kernel/
Prevention Tips
- Test in VMs first - Always!
- Use version control - Track changes
- Enable all debug options - During development
- Write tests - Create test programs
- Code reviews - Have others review
- Read similar drivers - Learn from examples
- Start simple - Add complexity gradually
Quick Reference: Error Codes
| Code | Meaning | Typical Cause |
|---|---|---|
| -ENOMEM | Out of memory | kmalloc failed |
| -EINVAL | Invalid argument | Bad parameter |
| -EBUSY | Device busy | Already open/in use |
| -EFAULT | Bad address | copy_to/from_user failed |
| -EIO | I/O error | Hardware problem |
| -ENODEV | No such device | Device not found |
| -EAGAIN | Try again | Would block in non-blocking mode |
| -EINTR | Interrupted | Signal received |
Remember: Kernel bugs can crash the system. Always test carefully!