Only the most important drivers of the basic devices (like hard drives) are compiled into the kernel, because putting all drivers in the kernel is a waste of memory and time for enabling and disabling drivers for non-existent devices. This is why optional devices' drives were put into kernel modules, loaded on demand.
Spis treści:
A kernel module is a plain compiled file in the standard ELF format. It needs to export two functions: init_module, which is used to initialize the module (and run during module loading), cleanup_module, used to do whatever is needed to finish work correctly (run during the removing of the module from the kernel).
The init_module functions should be written to return zero in case of success, and in case of error - one of the negative errno values, best describing the problem.
Many information about 2.4 kernel modules is suitable for 2.6 kernels, the section of this page dedicated to 2.6 kernels will add only what's changed from 2.4.
According to what I've said above, the simplest module looks like this:
format ELF section ".text" executable ; code section start ; exporting two required functions public init_module public cleanup_module ; declaration of an external function used for displaying messages extrn printk init_module: push dword str1 ; string to display call printk pop eax ; remove arguments from the stack xor eax, eax ; zero means no errors ret cleanup_module: push dword str2 call printk pop eax ret section ".data" writeable str1 db "<1> Inside init_module." , 10, 0 str2 db "<1> Inside cleanup_module.", 10, 0 section ".modinfo" __module_kernel_version db "kernel_version=2.4.26", 0 __module_license db "license=GPL", 0 __module_author db "author=Bogdan D.", 0 __module_description db "description=first kernel module.", 0Notice a few things:
In short: address of the string is put on the stack, after any extra data put in the reverse order, if the function should display any variables in the string, like %d (integer). This will be shown in the example module.
The string should start with <N>, N being a number. This tells the kernel about the severity of the message. For us, setting N to 1 is enough.
If the printed strings don't show up on the screen, they surely
show up after the dmesg
command (usually at the end)
and in the kernel log file /var/log/messages.
Unknown reasons made NASM-assembled modules not insertable into the kernel.
Each function is called in the C calling convention, which means we cleanup the stack.
modinfo
.
It contains the following information: for which kernel version this module is for, who is the module's author, module license, module parameters. The variable names must remain the same, you should change the text after the equal signs.
After assembling (fasm module_hello.asm
), you can install this module as
root with the command
insmod ./modul_hello.o
and remove it with the command
rmmod modul_hello
(notice the missing .o extension).
The list of all currently loaded modules can be obtained with the command lsmod
.
Now, I'll show how to register a character device and reserve resources for it: IRQ and memory + port ranges.
To register a character device (a device which permits reading single bytes, as opposed to,
say, a hard disk) use the kernel-exported register_chrdev
function.
It accepts 3 arguments. Starting from the left (last put on the stack), these are:
You can specify zero here. In that case, the kernel will assign us an unused number. This major number is the first of the two numbers (the second one is called the minor number) you can see in the detail listing of the /dev directory, for example
crw-rw-rw- 1 root root 1, 5 aug 16 15:28 /dev/zero
The zero device has major number 1 and minor number 5. The C letter at the beginning tells that this is a character device. Other marks are: D (directory), S (socket), B (block device), P (pipe, FIFO), L (symbolic link).
Address of the device name, which is a sequence of characters ending with byte zero.
The most important are: opening, closing, writing and reading from the device. The structure itself for a 2.4 kernel looks like this:
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file*, char*, size_t, loff_t *);
ssize_t (*write) (struct file *, const char *, size_t,
loff_t *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *,
struct poll_table_struct *);
int (*ioctl) (struct inode*, struct file*, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file*,struct dentry*, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*readv) (struct file *, const struct iovec *,
unsigned long, loff_t *);
ssize_t (*writev) (struct file *, const struct iovec *,
unsigned long, loff_t *);
};
Each field of this structure is a DWORD. For basic operations, we only need the third, fourth, ninth and eleventh (file closing) fields. If you aren't planning to implement some function, put a zero in the corresponding field of this structure.
If we call this function with our own major number and the function succeeds, it will return zero. If we asked the kernel for a major number and the function succeeds, it will return a positive integer, which is the kernel-assigned major number for our device.
NOTE: The register_chrdev
function does not
create a device file in the /dev directory. We have to do it ourselves, after loading the module.
To unregister a character device, call the unregister_chrdev
function.
Its first argument (last put on the stack) is the assigned major number, the second argument
is the address of the device name.
Reserving these resources is easy. All you need to do is call the
__request_region
function. It accepts 4 arguments. Starting from left,
(last put on the stack), they are:
ioport_resource
variable, if memory -
iomem_resource
.
Both variables are exported by the kernel, so you can declare them as external to
your module.In case of failure, this function returns zero (in EAX).
Both types of resources can be released using the __release_region
function. It has 3 arguments, which are the same as the first 3 above (type, start and range length).
Interrupt request (IRQ) resources are registered with the request_irq
function.
It takes 5 arguments of type DWORD. Starting from the left (last put on the stack), they are:
void handler (int irq, void *dev_id, struct pt_regs *regs);
As you can see, we can tell which interrupt was generated and by which device.
The last argument is said to be used rarely.If this function fails, it will return a negative value.
Releasing an IRQ is done using the free_irq
function.
Its first argument (last on the stack) is our IRQ number, second argument is the address
of our file_operations structure.
The module shown below will register a software character device (a device which has no hardware, like /dev/null) with IRQ 4, port range 600h-6FFh, memory range 80000000h - 8000FFFFh and basic operations: opening, closing, reading, writing, seeking. For simplicity, this code doesn't check if the resources are occupied. If they are, the kernel will return an error and the module won't load.
; Example 2.4 kernel module ; ; Author: Bogdan D., bogdandr (na) op . pl ; ; assemble: ; fasm modul_dev_fasm.asm format ELF section ".text" executable ; exporting required functions public init_module public cleanup_module ; importing used functions and symbols extrn printk extrn register_chrdev extrn unregister_chrdev extrn request_irq extrn free_irq extrn __check_region extrn __request_region extrn __release_region extrn ioport_resource extrn iomem_resource ; resource ranges we're going to ask for PORTY_START = 0x600 PORTY_ILE = 0x100 RAM_START = 0x80000000 RAM_ILE = 0x00010000 ; constants needed for IRQ reservation SA_INTERRUPT = 0x20000000 NUMER_IRQ = 4 ; module initialization function init_module: pushfd ; registering a character device push dword file_oper push dword nazwa push dword 0 ; assign the number dynamically call register_chrdev add esp, 3*4 ; remove arguments from the stack cmp eax, 0 ; check for error jg .dev_ok ; if we are here, there was an error. Show it. push eax ; argument for the error string push dword dev_err ; address of the error string call printk ; print the error add esp, 1*4 ; removing only 1*4 bytes, because: pop eax ; will exit with the error code in EAX jmp .koniec .dev_ok: mov [major], eax ; reserve I/O ports push dword nazwa push dword PORTY_ILE push dword PORTY_START push dword ioport_resource call __request_region add esp, 4*4 test eax, eax ; check for error jnz .iop_ok push eax ; argument for the error string push dword porty_err ; address of the error string call printk ; print the error add esp, 1*4 ; will 'pop eax' later ; unregister the device push dword nazwa push dword [major] call unregister_chrdev add esp, 2*4 pop eax ; will exit with the error code in EAX jmp .koniec .iop_ok: ; reserve memory push dword nazwa push dword RAM_ILE push dword RAM_START push dword iomem_resource call __request_region add esp, 4*4 test eax, eax ; check for error jnz .iomem_ok push eax push dword ram_err call printk ; print the error add esp, 1*4 ; will 'pop eax' later ; unregister the device push dword nazwa push dword [major] call unregister_chrdev add esp, 2*4 ; free reserved ports push dword PORTY_ILE push dword PORTY_START push dword ioport_resource call __release_region add esp, 3*4 pop eax ; will exit with the error code in EAX jmp .koniec .iomem_ok: ; assigning IRQ: push dword file_oper push dword nazwa push dword SA_INTERRUPT push dword obsluga_irq push dword NUMER_IRQ call request_irq add esp, 5*4 cmp eax, 0 jge .irq_ok push eax push dword irq_err call printk ; print the error add esp, 1*4 ; will 'pop eax' later ; unregister the device push dword nazwa push dword [major] call unregister_chrdev add esp, 2*4 ; free reserved ports push dword PORTY_ILE push dword PORTY_START push dword ioport_resource call __release_region add esp, 3*4 ; free reserved memory push dword RAM_ILE push dword RAM_START push dword iomem_resource call __release_region add esp, 3*4 pop eax ; will exit with the error code in EAX jmp .koniec .irq_ok: ; print info about successful module load push dword NUMER_IRQ push dword [major] push dword uruch call printk add esp, 3*4 xor eax, eax ; zero - no errors .koniec: popfd ret ; called when the module is unloaded cleanup_module: pushfd push eax ; free the IRQ push dword file_oper push dword NUMER_IRQ call free_irq add esp, 2*4 ; unregister the device: push dword nazwa push dword [major] call unregister_chrdev add esp, 2*4 ; free reserved ports push dword PORTY_ILE push dword PORTY_START push dword ioport_resource call __release_region add esp, 3*4 ; free reserved memory push dword RAM_ILE push dword RAM_START push dword iomem_resource call __release_region add esp, 3*4 ; print info about successful module unload push dword usun call printk add esp, 1*4 pop eax popfd ret ; Out interrupt service function. This one does nothing, but argument placement ; on the stack is shown obsluga_irq: push ebp mov ebp, esp ; [ebp] = old EBP ; [ebp+4] = return address ; [ebp+8] = arg1 ; ... irq equ ebp+8 dev_id equ ebp+12 regs equ ebp+16 leave ret ; Define device operations ; Reading from device - return a sequence of 1Eh bytes of the specified length. ; This device is an infinite source, just like /dev/zero czytanie: push ebp mov ebp, esp ; argument placement on the stack: s_file equ ebp+8 ; pointer to a file structure bufor equ ebp+12 ; data buffer address l_jedn equ ebp+16 ; requested number of bytes loff equ ebp+20 ; requested start position of reading pushfd push edi push ecx mov ecx, [l_jedn] mov al, 0x1e cld mov edi, [bufor] rep stosb ; fill buffer with 1Eh bytes pop ecx pop edi popfd mov eax, [l_jedn] ; return the number of requested bytes leave ret ; writing to the device - infinite well (will consume anything) zapis: push ebp mov ebp, esp ; don't write physically anything, just return the number ; of bytes we were supposed to write. mov eax, [l_jedn] leave ret ; seek przejscie: ; close: zamykanie: ; open: otwieranie: xor eax, eax ; all 3 functions always return success ret section ".data" writeable major dd 0 ; kernel-assigned device major number ; addresses of the functions for device operations file_oper: dd 0, przejscie, czytanie, zapis, 0, 0, 0, 0, otwieranie, 0 dd zamykanie, 0, 0, 0, 0, 0 dev_err db "<1>Device register error: %d.", 10, 0 irq_err db "<1>IRQ assignment error: %d.", 10, 0 porty_err db "<1>Port assignment error: EAX=%d", 10, 0 ram_err db "<1>Memory assignment error: EAX=%d", 10, 0 uruch db "<1>Module loaded. Maj=%d, IRQ=%d", 10, 0 usun db "<1>Module removed.", 10, 0 nazwa db "test00", 0 sciezka db "/dev/test00", 0 section ".modinfo" __module_kernel_version db "kernel_version=2.4.26", 0 __module_license db "license=GPL", 0 __module_author db "author=Bogdan D.", 0 __module_description db "description=Example kernel module", 0 __module_device db "device=test00", 0
The above module, after assembling, is easiest installed using the following script:
#!/bin/bash PLIK="modul_dev_fasm.o" # Put your module's name here NAZWA="test00" # Device's name # Inserting the module. /sbin/insmod $PLIK $* || { echo "insmod problem!" ; exit -1; } # finding and printing our module name /sbin/lsmod | grep `echo $PLIK | sed 's/[^a-z]/ /g' | awk '{print $1}' ` # print resource information grep $NAZWA /proc/devices grep $NAZWA /proc/ioports grep $NAZWA /proc/iomem grep $NAZWA /proc/interrupts # find and print device major number NR=`grep $NAZWA /proc/devices | awk '{print $1}'` echo "Major = $NR" # remove old device file rm -f /dev/$NAZWA # creating the device file in /dev # sys_mknod from inside the module does NOT work mknod /dev/$NAZWA c $NR 0 ls -l /dev/$NAZWA # short test: read 512 bytes and check their contents dd count=1 if=/dev/$NAZWA of=/x && hexdump /x && rm -f /x
All you have to do
is save this script under some name, like instal.sh
,
allow it to be executed using the command chmod u+x instal.sh
and run it using ./instal.sh
, as root, of course.
If the module is successfully loaded, the script will display the resources assigned to
the module - I/O ports, IRQ and memory - by reading the necessary files in the /proc
directory. The script will also create the device file in the /dev directory, with the
correct major number. After that, a short test will be performed.
You can easily uninstall the module using the script:
#!/bin/bash PLIK="modul_dev_fasm" # Enter you module name here, without the .o NAZWA="test00" # Device's name /sbin/rmmod $PLIK && rm -f /dev/$NAZWA
The simplest 2.6 kernel module looks like this:
format ELF section ".init.text" executable align 1 section ".text" executable align 4 public init_module public cleanup_module extrn printk init_module: push dword str1 call printk pop eax xor eax, eax ret cleanup_module: push dword str2 call printk pop eax ret section ".modinfo" align 32 __kernel_version db "kernel_version=2.6.16", 0 __mod_vermagic db "vermagic=2.6.16 686 REGPARM 4KSTACKS gcc-4.0", 0 __module_license db "license=GPL", 0 __module_author db "author=Bogdan D.", 0 __module_description db "description=First 2.6 kernel module", 0 section "__versions" align 32 dd 0xfa02c634 n1: db "struct_module" times 64-4-($-n1) db 0 dd 0x1b7d4074 n2: db "printk" times 64-4-($-n2) db 0 section ".data" writeable align 4 str1 db "<1> Inside init_module(). ", 10, 0 str2 db "<1> Inside cleanup_module(). ", 10, 0 section ".gnu.linkonce.this_module" writeable align 128 align 128 __this_module: ; total length: 512 bytes dd 0, 0, 0 .nazwa: db "modul", 0 times 64-4-($-.nazwa) db 0 times 100 db 0 dd init_module times 220 db 0 dd cleanup_module times 112 db 0
You can surely see many differences, right? We'll discuss them section by section now:
.init.text
In general, there should be at least two: .init.text
, containing the
initialization procedure and .exit.text
, containing the exit procedure.
Additionally, you can of course have a data section .data
and a code
section .text
.
If during installing the module, you get Accessing
a corrupted shared library
messages, you should do some shuffling wit the sections -
add a .text
, remove .init.text
, change the order etc.
.gnu.linkonce.this_module
This is the most important one. Without this section, each attempt to install the module
will result in a No module found in object
message. Contents of this
section is a structure named __this_module
of type module
. The best
you can do right now is to copy the above example one to your modules, changing
the module name (between the quotation marks) and the entry and exit points' names.
You can also use the following macro:
macro gen_this_module name*, entry, exit { section '.gnu.linkonce.this_module' writeable align 128 align 128 __this_module: dd 0, 0, 0 .mod_nazwa: db name, 0 times 64-4-($-.mod_nazwa) db 0 times 100 db 0 if entry eq dd init_module else dd entry end if times 220 db 0 if exit eq dd cleanup_module else dd exit end if times 112 db 0 }
Using this macro is very easy: just pass it the name of the module, which should
be displayed with the lsmod
command and the names (addresses) of the
entry and exit procedures, for example
gen_this_module "your_module", init_module, cleanup_module
This macro call should be placed where the section should be, for example - after the last declaration in the data section. In any case NOT inside any section.
modinfo
This section, compared to the one in the 2.4 kernel module, has only one, but
very important new entry - vermagic
. In your kernel this
string will probably differ from mine only in the kernel version. Original string
looks like this:
#define VERMAGIC_STRING \
UTS_RELEASE " " \
MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \
MODULE_ARCH_VERMAGIC \
"gcc-" __stringify(__GNUC__) "." __stringify(__GNUC_MINOR__)
#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY \
MODULE_REGPARM MODULE_STACKSIZE
and you can find it in the asm* subdirectories of the INCLUDE directory in the kernel source tree and in the VERMAGIC.H file.
__versions
This section contains information about versions of the procedures which this module uses. Structure of this section is fairly easy: first put a DWORD with the number matching the given kernel function, which can be found in the MODULE.SYMVERS in the kernel source main directory. Right after the number goes the name of the used function, filled with zeros up to 64 bytes.
This section is not required for the module to operate correctly, but should be in
every module or kernel tainted
messages will appear.
You can generate this whole section using my script symvers-fasm.txt. All you have to do is run perl symvers-fasm.pl your_module.asm
.
Reserving resources in 2.6 kernels from outside view (the perspective of the C language) isn't much different from the one in 2.4 kernels. But inside, two major changes have been made:
In 2.6 kernels it looks like this:
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file*,char __user*,size_t,
loff_t*);
ssize_t (*aio_read) (struct kiocb *, char __user *,
size_t, loff_t);
ssize_t (*write) (struct file *, const char __user *,
size_t, loff_t *);
ssize_t (*aio_write) (struct kiocb *, const char __user*,
size_t, loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *,
struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *,
unsigned int, unsigned long);
long (*unlocked_ioctl) (struct file *, unsigned int,
unsigned long);
long (*compat_ioctl) (struct file *, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, struct dentry *,
int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*readv) (struct file *, const struct iovec *,
unsigned long, loff_t *);
ssize_t (*writev) (struct file *, const struct iovec *,
unsigned long, loff_t *);
ssize_t (*sendfile) (struct file *, loff_t *, size_t,
read_actor_t, void *);
ssize_t (*sendpage) (struct file *, struct page *, int,
size_t, loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *,
unsigned long, unsigned long, unsigned long,
unsigned long);
int (*check_flags)(int);
int (*dir_notify)(struct file *filp, unsigned long arg);
int (*flock) (struct file *, int, struct file_lock *);
};
My distribution kernel was compiled in such a way that 3 first parameters to each procedure except printk are passed in registers: EAX, EDX, ECX, and the rest on the stack. To check if your kernel does and expects the same, use the commands
grep -R regpar /lib/modules/`uname -r`/build/|grep Makefile grep -R REGPAR /lib/modules/`uname -r`/build/|grep config
If the results contain something similar to:
CONFIG_REGPARM=y #define CONFIG_REGPARM 1
then your kernel is probably compiled like mine. If so, you can use the below macro URUCHOM to call kernel functions. If not, you can modify the macro. If your system hangs when you try to load the module, you probably need to modify the macro.
Just like the 2.4 example, the module below will register a software character device (a device which has no hardware, like /dev/null) with IRQ 4, port range 600h-6FFh, memory range 80000000h - 8000FFFFh and basic operations: opening, closing, reading, writing, seeking. For simplicity, this code doesn't check if the resources are occupied. If they are, the kernel will return an error and the module won't load.
format ELF section ".text" executable align 4 public init_module public cleanup_module extrn printk extrn register_chrdev extrn unregister_chrdev extrn request_irq extrn free_irq extrn __request_region extrn __release_region extrn ioport_resource extrn iomem_resource PORTY_START = 0x600 PORTY_ILE = 0x100 RAM_START = 0x80000000 RAM_ILE = 0x00010000 SA_INTERRUPT = 0x20000000 NUMER_IRQ = 4 macro uruchom funkcja, par1, par2, par3, par4, par5 { if ~ par5 eq push dword par5 end if if ~ par4 eq push dword par4 end if if ~ par3 eq mov ecx, par3 end if if ~ par2 eq mov edx, par2 end if if ~ par1 eq mov eax, par1 end if call funkcja if ~ par5 eq add esp, 4 end if if ~ par4 eq add esp, 4 end if } init_module: pushfd ; registering character device: uruchom register_chrdev, 0, nazwa, file_oper cmp eax, 0 jg .dev_ok ; print error push eax push dword dev_err call printk add esp, 1*4 ; removing only 1*4 bytes, because: pop eax ; will exit with the error code in EAX jmp .koniec .dev_ok: mov [major], eax ; reserve I/O ports uruchom __request_region, ioport_resource, PORTY_START, PORTY_ILE, nazwa test eax, eax jnz .iop_ok push eax push dword porty_err call printk add esp, 1*4 ; will 'pop eax' later ; unregister the device uruchom unregister_chrdev, [major], nazwa pop eax ; will exit with the error code in EAX jmp .koniec .iop_ok: ; reserve memory uruchom __request_region, iomem_resource, RAM_START, RAM_ILE, nazwa test eax, eax jnz .iomem_ok push eax push dword ram_err call printk add esp, 1*4 ; will 'pop eax' later ; unregister the device uruchom unregister_chrdev, [major], nazwa ; free reserved ports uruchom __release_region, ioport_resource, PORTY_START, PORTY_ILE pop eax ; will exit with the error code in EAX jmp .koniec .iomem_ok: ; assigning the IRQ: uruchom request_irq, NUMER_IRQ, obsluga_irq, SA_INTERRUPT, nazwa, file_oper cmp eax, 0 jge .irq_ok push eax push dword irq_err call printk add esp, 1*4 ; will 'pop eax' later ; unregister the device uruchom unregister_chrdev, [major], nazwa ; free reserved ports uruchom __release_region, ioport_resource, PORTY_START, PORTY_ILE ; free reserved memory uruchom __release_region, iomem_resource, RAM_START, RAM_ILE pop eax ; will exit with the error code in EAX jmp .koniec .irq_ok: ; print info about successful module load push dword NUMER_IRQ push dword [major] push dword uruch call printk add esp, 3*4 xor eax, eax .koniec: popfd ret ; called when the module is unloaded cleanup_module: pushfd push eax ; free the IRQ: uruchom free_irq, NUMER_IRQ, file_oper ; unregister the device: uruchom unregister_chrdev, [major], nazwa ; free reserved ports uruchom __release_region, ioport_resource, PORTY_START, PORTY_ILE ; free reserved memory uruchom __release_region, iomem_resource, RAM_START, RAM_ILE push dword usun call printk add esp, 1*4 pop eax popfd ret ; Out interrupt service function. This one does nothing, but argument placement ; on the stack is shown ; void handler (int irq, void *dev_id, struct pt_regs *regs); section ".text" executable align 4 obsluga_irq: push ebp mov ebp, esp ; [ebp] = old EBP ; [ebp+4] = return address ; [ebp+8] = arg1 ; ... irq equ ebp+8 dev_id equ ebp+12 regs equ ebp+16 ; your code here leave ret ; Define device operations ; Reading from device - return a sequence of 1Eh bytes of the specified length. ; This device is an infinite source, just like /dev/zero czytanie: ; ssize_t (*read) (struct file *, char *, size_t, loff_t *); push ebp mov ebp, esp ; argument placement on the stack (3 params in registers): loff equ ebp+8 pushfd push edi push ecx mov al, 0x1e cld mov edi, edx rep stosb pop ecx pop edi popfd ; as many as requested was read mov eax, ecx leave ret zapis: ; ssize_t (*write) (struct file *, const char *, size_t, loff_t *); push ebp mov ebp, esp ; don't write physically anything, just return the number ; of bytes we were supposed to write (third parameter). mov eax, ecx leave ret ; seek przejscie: ; close zamykanie: ; open otwieranie: xor eax, eax ; all 3 functions always return success ret section ".data" writeable align 4 major dd 0 ; kernel-assigned device major number ; addresses of the functions for device operations file_oper: dd 0, przejscie, czytanie, 0, zapis, 0, 0, 0, 0, 0, 0, 0 dd otwieranie, 0, zamykanie, 0, 0, 0, 0, 0, 0, 0, 0, 0 dd 0, 0, 0 dd 0, 0, 0 dev_err db "<1>Device register error: %d.", 10, 0 irq_err db "<1>IRQ assignment error: %d.", 10, 0 porty_err db "<1>Port assignment error: EAX=%d", 10, 0 ram_err db "<1>Memory assignment error: EAX=%d", 10, 0 uruch db "<1>Module loaded. Maj=%d, IRQ=%d", 10, 0 usun db "<1>Module removed.", 10, 0 nazwa db "test00", 0, 0 sciezka db "/dev/test00", 0 section ".modinfo" align 32 __kernel_version db "kernel_version=2.6.16", 0 __mod_vermagic db "vermagic=2.6.16 686 REGPARM 4KSTACKS gcc-4.0",0 __module_license db "license=GPL", 0 __module_author db "author=Bogdan D.", 0 __module_description db "description=Example 2.6 kernel module", 0 __module_device db "device=test00", 0 __module_depends db "depends=", 0 ; irrelevant, taken from a compiled C module: __mod_srcversion db "srcversion=F5CE0CFFE0191EDB2F816D4", 0 section "__versions" align 32 ____versions: dd 0xfa02c634 ; from MODULE.SYMVERS n1: db "struct_module", 0 times 64-4-($-n1) db 0 dd 0x1b7d4074 n2: db "printk", 0 times 64-4-($-n2) db 0 dd 0xb5145e00 n3: db "register_chrdev", 0 times 64-4-($-n3) db 0 dd 0xc192d491 n4: db "unregister_chrdev", 0 times 64-4-($-n4) db 0 dd 0x26e96637 n5: db "request_irq", 0 times 64-4-($-n5) db 0 dd 0xf20dabd8 n6: db "free_irq", 0 times 64-4-($-n6) db 0 dd 0x1a1a4f09 n7: db "__request_region", 0 times 64-4-($-n7) db 0 dd 0xd49501d4 n8: db "__release_region", 0 times 64-4-($-n8) db 0 dd 0x865ebccd n9: db "ioport_resource", 0 times 64-4-($-n9) db 0 dd 0x9efed5af n10: db "iomem_resource", 0 times 64-4-($-n10) db 0 section ".gnu.linkonce.this_module" writeable align 128 align 128 __this_module: ; total length: 512 bytes dd 0, 0, 0 .mod_nazwa: db "modul_dev_fasm", 0 times 64-4-($-.mod_nazwa) db 0 times 100 db 0 dd init_module times 220 db 0 dd cleanup_module times 112 db 0
To install and remove the module from the kernel you can use the same scripts as for 2.4 kernel.
In later versions of the kernel the general way of writing modules stayed the same. The Linux kernel, like all big programs, is being developed and it changes over time. Such changes impact, among other:
__versions
section,__versions
section,modulestructure, placed in
__this_module
(what's worse, some of its elements exist only conditionally, depending on the
configuration of the given kernel, thus impacting the offsets of other elements),file_operations
structure,.modinfo
section, especially new parameters appearing there, for example:
"retpoline=Y"
, meaning a safe compilation, without indirect jumps
(e.g. jmp [eax]
),"intree=Y"
, meaning a module from within the kernel code tree, which prevents tainting,
gold
,
you can't even configure the kernel if this linker is used,section '.note.GNU-stack'
section .note.GNU-stack noalloc noexec nowrite progbits
You can peek
how the modules are compiled by running make
(with the right parameters) with the flag V=1, e.g. make O=build/ V=1 modules
(to build all configured modules).
The location of the file with function versions (or check-sums) can vary depending on the kernel version
and on the given Linux distribution - it can be a file called Module.symvers
somewhere in the /lib/modules/kernel_version/
directory, it can be the file symvers-kernel_version
(perhaps compressed) in the /boot/
directory.
The contents of the module
structure should be in the module.h
file
in the directory containing the uncompressed kernel source in the right version:
linux-X.Y.Z/include/linux/
.
The file /usr/include/linux/module.h
can be inappropriate for this purpose.
Other interesting files, depending on the kernel version:
linux-X.Y.Z/include/asm-i386/MODULE.H
,linux-X.Y.Z/include/linux/VERSION.H
,linux-X.Y.Z/scripts/mod/MODPOST
,linux-X.Y.Z/include/linux/init.h
,linux-X.Y.Z/include/linux/vermagic.h
,linux-X.Y.Z/include/linux/module.h
,linux-X.Y.Z/include/linux/moduleparam.h
,linux-X.Y.Z/Makefile
,linux-X.Y.Z/scripts/Makefile.modpost
,linux-X.Y.Z/scripts/Kbuild.include
,linux-X.Y.Z/scripts/Makefile.build
,linux-X.Y.Z/arch/x86/Makefile
,and servicing the modules is taken care by e.g. linux-X.Y.Z/kernel/module.c
and
linux-X.Y.Z/arch/x86/module.c
.
The file_operations
structure can be found in linux-X.Y.Z/include/linux/fs.h
,
and in the 5.5.12 kernel it looks like this:
(skip the new file_operations structure)
struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); unsigned long mmap_supported_flags; int (*open) (struct inode *, struct file *); int (*flush) (struct file *, fl_owner_t id); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, loff_t, loff_t, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **, void **); long (*fallocate)(struct file *file, int mode, loff_t offset, loff_t len); void (*show_fdinfo)(struct seq_file *m, struct file *f); #ifndef CONFIG_MMU unsigned (*mmap_capabilities)(struct file *); #endif ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int); loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); }
Instead of the function
register_chrdev
we should probably use
register_chrdev_region
, instead of request_irq
- probably pci_request_irq
(a symbol with this name is on the list of symbols
marked as exported by vmlinux
, meaning the kernel itself).
When you have kernel code configured to the target kernel by running make XXXconfig
,
you can use the below .modinfo
section generator
(if you're compiling for your current system, you just need to copy the configuration file, e.g.
/boot/config-kernel_version
to the file .config
in the kernel source directory,
and run make oldconfig
).
(skip the generator)
#include <stdio.h> #include <stddef.h> #include <linux/module.h> #define MODULE_NAME "module1" static void disp_common (const char mname[], const char pointer_type[]) { struct module m; puts ("align 128"); puts ("__this_module:"); printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, name)); printf ("\t.mod_name:\tdb '%s', 0\n", mname); printf ("\t\t\ttimes %d - ($ - .mod_name) db 0\n", sizeof (m.name)); printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, init) - offsetof (struct module, name) - sizeof (m.name)); printf ("\t.mod_init:\t%s init_module\n", pointer_type); printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, exit) - offsetof (struct module, init) - sizeof (m.init)); printf ("\t.mod_exit:\t%s cleanup_module\n", pointer_type); printf ("\t\t\ttimes %d db 0\n", sizeof (struct module) - offsetof (struct module, exit) - sizeof (m.exit)); puts ("--------------------------------"); } static void disp_nasm (const char mname[], const char pointer_type[]) { puts ("--------------------------------\nsection .gnu.linkonce.this_module"); disp_common (mname, pointer_type); } static void disp_fasm (const char mname[], const char pointer_type[]) { puts ("--------------------------------\nsection '.gnu.linkonce.this_module' writeable align 128"); disp_common (mname, pointer_type); } int main (void) { puts ("NASM, 32-bit:"); disp_nasm (MODULE_NAME, "dd"); puts ("NASM, 64-bit:"); disp_nasm (MODULE_NAME, "dq"); puts ("FASM, 32-bit:"); disp_fasm (MODULE_NAME, "dd"); puts ("FASM, 64-bit:"); disp_fasm (MODULE_NAME, "dq"); return 0; }
You can compile it using the following script:
(skip the compiling script)
#!/bin/bash lpath=/path/to/linux-X.Y.Z gcc \ -I /usr/include \ -I $lpath/arch/x86/include \ -I $lpath/arch/x86/include/generated \ -I $lpath/arch/x86/include/uapi \ -I $lpath/arch/x86/include/generated/uapi \ -I $lpath/include \ -I $lpath/include/uapi \ -I $lpath/include/generated \ -I $lpath/include/generated/uapi \ -I $lpath/build/include \ -I $lpath/build/arch/x86/include \ -I $lpath/build/arch/x86/include/generated \ -I $lpath/build/arch/x86/include/uapi \ -I $lpath/build/arch/x86/include/generated/uapi \ -include $lpath/include/linux/kconfig.h \ -D__KERNEL__ \ -DMODULE \ -o gen-modul-info \ gen-modul-info.c
putting your path to the unpacked kernel sources.
After running the program, it will display the content which you, perhaps after enhancing
(e.g. with other fields if the structure, because the script only sets the name of the module and the
addresses of the initialization and cleanup functions), should place in the .modinfo
section:
(skip an example result)
section .gnu.linkonce.this_module align 128 __this_module: times 24 db 0 .mod_name: db 'module1', 0 times 64 - ($ - .mod_name) db 0 times 296 db 0 .mod_init: dq init_module times 432 db 0 .mod_exit: dq cleanup_module times 72 db 0
The kernel is written in the C language, so programmers using this language have the comfort of not being forced to copy and adjust the structures to their code, because they have them in the header files already. Similarly, to initialize the structures they just need to initialize the specific fields and the compiler will insert the right values in the right places - no need to count after how many bytes should one put the next field. This becomes more and more difficult for other programmers.
Because of this, you can consider writing the facade
part of your module
(the part with the declaration of the initialization and cleanup functions, with the .modinfo
sections and all the structures) in the C language, and the module functionality - in assembly
and link the parts together using the rules of the C calling convention, found in many places
on the Internet, e.g. look for x64-abi-0.96.pdf.
Another problem may also be the architecture: 32- or 64-bit, because the register names and the way parameters are passed are different.
In the case of FASM, where the output file type is put in the source file, you need to write separate versions for 32- and 64-bit systems.
In the case of NASM, things can be a bit easier, because you can check the output file type
(passed on the command line) in the code and modify the register or instruction names accordingly.
You can use macros like the following:
(skip architecture macros)
%ifidn __OUTPUT_FORMAT__, elf64 bits 64 %define ARCH 'x64' %define RET_REG rax %define ptr_type dq %define ptr_size 8 %define pushflags pushfq %define popflags popfq %else bits 32 %define ARCH 'x86' %define RET_REG eax %define ptr_type dd %define ptr_size 4 %define pushflags pushfd %define popflags popfd %endif %macro call_fnc 1-7 ; function, par1, par2, par3, par4, par5, par6 %if ARCH = 'x64' %ifnempty %7 mov r9, %7 %endif %ifnempty %6 mov r8, %6 %endif %ifnempty %5 mov r10, %5 %endif %ifnempty %4 mov rdx, %4 %endif %ifnempty %3 mov rsi, %3 %endif %ifnempty %2 mov rdi, %2 %endif call %1 %else %ifnempty %7 push dword %7 %endif %ifnempty %6 push dword %6 %endif %ifnempty %5 push dword %5 %endif %ifnempty %4 mov ecx, %4 %endif %ifnempty %3 mov edx, %3 %endif %ifnempty %2 mov eax, %2 %endif call %1 %ifnempty %7 add esp, 4 %endif %ifnempty %6 add esp, 4 %endif %ifnempty %5 add esp, 4 %endif %endif %endmacro
and then use them in the code:
init_module: call_fnc printk, running_msg xor RET_REG, RET_REG ret
This should make your job easier and reduce code duplication among many files.
If you want to seriously start writing modules, you can start by reading the documentation about how to do everything properly and what functionalities and mechanisms are offered by the kernel: