Writing Linux kernel modules

Only the most important drivers of the basic devices (like hard drives) are compiled into the kernel, because putting all drivers in the kernel is a waste of memory and time for enabling and disabling drivers for non-existent devices. This is why optional devices' drives were put into kernel modules, loaded on demand.

Spis treści:

  1. The basics
  2. The simplest 2.4-series kernel module
  3. Registering a character device
  4. Registering input-output ports and memory areas
  5. Registering IRQ resources
  6. Example of a 2.4-series kernel module
  7. The simplest 2.6-series kernel module
  8. Reserving resources in 2.6-series kernels
  9. Example of a 2.6-series kernel module
  10. Other kernels and other tricks

The basics

A kernel module is a plain compiled file in the standard ELF format. It needs to export two functions: init_module, which is used to initialize the module (and run during module loading), cleanup_module, used to do whatever is needed to finish work correctly (run during the removing of the module from the kernel).

The init_module functions should be written to return zero in case of success, and in case of error - one of the negative errno values, best describing the problem.

Many information about 2.4 kernel modules is suitable for 2.6 kernels, the section of this page dedicated to 2.6 kernels will add only what's changed from 2.4.


The simplest 2.4-series kernel module


(skip the simplest module)

According to what I've said above, the simplest module looks like this:


(skip the simplest module's code)
	format ELF

	section ".text" executable	; code section start

	; exporting two required functions
	public	init_module
	public	cleanup_module

	; declaration of an external function used for displaying messages
	extrn	printk

	init_module:
		push	dword str1	; string to display
		call	printk
		pop	eax		; remove arguments from the stack

		xor	eax, eax	; zero means no errors
		ret

	cleanup_module:
		push	dword str2
		call	printk
		pop	eax

		ret

	section ".data" writeable
	str1		db	"<1> Inside init_module."   , 10, 0
	str2		db	"<1> Inside cleanup_module.", 10, 0

	section ".modinfo"
	__module_kernel_version db	"kernel_version=2.4.26", 0
	__module_license	db	"license=GPL", 0
	__module_author		db	"author=Bogdan D.", 0
	__module_description	db "description=first kernel module.", 0
Notice a few things:
  1. Printing strings is done using the internal kernel function - printk. It works similarly to the C language printf function, which is of course unreachable when the kernel is booting.

    In short: address of the string is put on the stack, after any extra data put in the reverse order, if the function should display any variables in the string, like %d (integer). This will be shown in the example module.

    The string should start with <N>, N being a number. This tells the kernel about the severity of the message. For us, setting N to 1 is enough.

    If the printed strings don't show up on the screen, they surely show up after the dmesg command (usually at the end) and in the kernel log file /var/log/messages.

  2. FASM syntax.

    Unknown reasons made NASM-assembled modules not insertable into the kernel.

  3. Each function is called in the C calling convention, which means we cleanup the stack.

  4. New section - modinfo.

    It contains the following information: for which kernel version this module is for, who is the module's author, module license, module parameters. The variable names must remain the same, you should change the text after the equal signs.

After assembling (fasm module_hello.asm), you can install this module as root with the command

	insmod ./modul_hello.o

and remove it with the command

	rmmod modul_hello

(notice the missing .o extension).

The list of all currently loaded modules can be obtained with the command lsmod.

Now, I'll show how to register a character device and reserve resources for it: IRQ and memory + port ranges.


Registering a character device


(skip register a character device)

To register a character device (a device which permits reading single bytes, as opposed to, say, a hard disk) use the kernel-exported register_chrdev function. It accepts 3 arguments. Starting from the left (last put on the stack), these are:

  1. Device major number, chosen by us.

    You can specify zero here. In that case, the kernel will assign us an unused number. This major number is the first of the two numbers (the second one is called the minor number) you can see in the detail listing of the /dev directory, for example

    	crw-rw-rw-  1 root root 1, 5 aug 16 15:28 /dev/zero

    The zero device has major number 1 and minor number 5. The C letter at the beginning tells that this is a character device. Other marks are: D (directory), S (socket), B (block device), P (pipe, FIFO), L (symbolic link).

  2. Address of the device name, which is a sequence of characters ending with byte zero.

  3. Address of a file_operations structure, inside which we will put addresses of functions used for operations on this device.

    The most important are: opening, closing, writing and reading from the device. The structure itself for a 2.4 kernel looks like this:


    (skip file_operations)
    	struct file_operations {
    		struct module *owner;
    		loff_t (*llseek) (struct file *, loff_t, int);
    		ssize_t (*read) (struct file*, char*, size_t, loff_t *);
    		ssize_t (*write) (struct file *, const char *, size_t,
    			loff_t *);
    		int (*readdir) (struct file *, void *, filldir_t);
    		unsigned int (*poll) (struct file *,
    			struct poll_table_struct *);
    		int (*ioctl) (struct inode*, struct file*, unsigned int,
    			unsigned long);
    		int (*mmap) (struct file *, struct vm_area_struct *);
    		int (*open) (struct inode *, struct file *);
    		int (*flush) (struct file *);
    		int (*release) (struct inode *, struct file *);
    		int (*fsync) (struct file*,struct dentry*, int datasync);
    		int (*fasync) (int, struct file *, int);
    		int (*lock) (struct file *, int, struct file_lock *);
    		ssize_t (*readv) (struct file *, const struct iovec *,
    			unsigned long, loff_t *);
    		ssize_t (*writev) (struct file *, const struct iovec *,
    			unsigned long, loff_t *);
    	}; 

    Each field of this structure is a DWORD. For basic operations, we only need the third, fourth, ninth and eleventh (file closing) fields. If you aren't planning to implement some function, put a zero in the corresponding field of this structure.

If we call this function with our own major number and the function succeeds, it will return zero. If we asked the kernel for a major number and the function succeeds, it will return a positive integer, which is the kernel-assigned major number for our device.

NOTE: The register_chrdev function does not create a device file in the /dev directory. We have to do it ourselves, after loading the module.

To unregister a character device, call the unregister_chrdev function. Its first argument (last put on the stack) is the assigned major number, the second argument is the address of the device name.


Registering input-output ports and memory areas


(skip registering resources)

Reserving these resources is easy. All you need to do is call the __request_region function. It accepts 4 arguments. Starting from left, (last put on the stack), they are:

  1. Type of the resource. If you want to reserve ports, give the address of the ioport_resource variable, if memory - iomem_resource. Both variables are exported by the kernel, so you can declare them as external to your module.
  2. Starting port number or starting memory address.
  3. Length of the port or memory range
  4. Address of the device name.

In case of failure, this function returns zero (in EAX).

Both types of resources can be released using the __release_region function. It has 3 arguments, which are the same as the first 3 above (type, start and range length).


Registering IRQ resources


(skip IRQ registering)

Interrupt request (IRQ) resources are registered with the request_irq function. It takes 5 arguments of type DWORD. Starting from the left (last put on the stack), they are:

  1. IRQ number we wish to reserve.
  2. Address of our interrupt service function. This function has this prototype:
    	void handler (int irq, void *dev_id, struct pt_regs *regs);
    As you can see, we can tell which interrupt was generated and by which device. The last argument is said to be used rarely.
  3. Integer SA_INTERRUPT = 0x20000000
  4. Address of the device name.
  5. Address of a file_operations structure, filled with function addresses.

If this function fails, it will return a negative value.

Releasing an IRQ is done using the free_irq function. Its first argument (last on the stack) is our IRQ number, second argument is the address of our file_operations structure.


Example of a 2.4-series kernel module


(skip to installation script)

The module shown below will register a software character device (a device which has no hardware, like /dev/null) with IRQ 4, port range 600h-6FFh, memory range 80000000h - 8000FFFFh and basic operations: opening, closing, reading, writing, seeking. For simplicity, this code doesn't check if the resources are occupied. If they are, the kernel will return an error and the module won't load.


(skip the module code)
	; Example 2.4 kernel module
	;
	; Author: Bogdan D., bogdandr (na) op . pl
	;
	; assemble:
	;   fasm modul_dev_fasm.asm

	format ELF
	section	".text" executable

	; exporting required functions
	public	init_module
	public	cleanup_module

	; importing used functions and symbols
	extrn	printk
	extrn	register_chrdev
	extrn	unregister_chrdev
	extrn	request_irq
	extrn	free_irq

	extrn	__check_region
	extrn	__request_region
	extrn	__release_region
	extrn	ioport_resource
	extrn	iomem_resource

	; resource ranges we're going to ask for
	PORTY_START	= 0x600
	PORTY_ILE	= 0x100

	RAM_START	= 0x80000000
	RAM_ILE		= 0x00010000

	; constants needed for IRQ reservation
	SA_INTERRUPT	= 0x20000000
	NUMER_IRQ	= 4

	; module initialization function
	init_module:
		pushfd

		; registering a character device
		push	dword file_oper
		push	dword nazwa
		push	dword 0			; assign the number dynamically
		call	register_chrdev
		add	esp, 3*4		; remove arguments from the stack

		cmp	eax, 0			; check for error
		jg	.dev_ok

		; if we are here, there was an error. Show it.
		push	eax			; argument for the error string
		push	dword dev_err		; address of the error string
		call	printk			; print the error
		add	esp, 1*4		; removing only 1*4 bytes, because:

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.dev_ok:

		mov	[major], eax

		; reserve I/O ports
		push	dword nazwa
		push	dword PORTY_ILE
		push	dword PORTY_START
		push	dword ioport_resource
		call	__request_region
		add	esp, 4*4

		test	eax, eax		; check for error
		jnz	.iop_ok

		push	eax			; argument for the error string
		push	dword porty_err	; address of the error string
		call	printk			; print the error
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		push	dword nazwa
		push	dword [major]
		call	unregister_chrdev
		add	esp, 2*4

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.iop_ok:

		; reserve memory
		push	dword nazwa
		push	dword RAM_ILE
		push	dword RAM_START
		push	dword iomem_resource
		call	__request_region
		add	esp, 4*4

		test	eax, eax		; check for error
		jnz	.iomem_ok

		push	eax
		push	dword ram_err
		call	printk			; print the error
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		push	dword nazwa
		push	dword [major]
		call	unregister_chrdev
		add	esp, 2*4

		; free reserved ports
		push	dword PORTY_ILE
		push	dword PORTY_START
		push	dword ioport_resource
		call	__release_region
		add	esp, 3*4

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.iomem_ok:
		; assigning IRQ:
		push	dword file_oper
		push	dword nazwa
		push	dword SA_INTERRUPT
		push	dword obsluga_irq
		push	dword NUMER_IRQ
		call	request_irq
		add	esp, 5*4

		cmp	eax, 0
		jge	.irq_ok

		push	eax
		push	dword irq_err
		call	printk			; print the error
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		push	dword nazwa
		push	dword [major]
		call	unregister_chrdev
		add	esp, 2*4

		; free reserved ports
		push	dword PORTY_ILE
		push	dword PORTY_START
		push	dword ioport_resource
		call	__release_region
		add	esp, 3*4

		; free reserved memory
		push	dword RAM_ILE
		push	dword RAM_START
		push	dword iomem_resource
		call	__release_region
		add	esp, 3*4

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.irq_ok:

		; print info about successful module load
		push	dword NUMER_IRQ
		push	dword [major]
		push	dword uruch
		call	printk
		add	esp, 3*4

		xor	eax, eax		; zero - no errors

	.koniec:

		popfd
		ret

	; called when the module is unloaded
	cleanup_module:
		pushfd
		push	eax

		; free the IRQ
		push	dword file_oper
		push	dword NUMER_IRQ
		call	free_irq
		add	esp, 2*4

		; unregister the device:
		push	dword nazwa
		push	dword [major]
		call	unregister_chrdev
		add	esp, 2*4

		; free reserved ports
		push	dword PORTY_ILE
		push	dword PORTY_START
		push	dword ioport_resource
		call	__release_region
		add	esp, 3*4

		; free reserved memory
		push	dword RAM_ILE
		push	dword RAM_START
		push	dword iomem_resource
		call	__release_region
		add	esp, 3*4

		; print info about successful module unload
		push	dword usun
		call	printk
		add	esp, 1*4

		pop	eax
		popfd
		ret

	; Out interrupt service function. This one does nothing, but argument placement
	;	on the stack is shown
	obsluga_irq:
		push	ebp
		mov	ebp, esp

	; [ebp] = old EBP
	; [ebp+4] = return address
	; [ebp+8] = arg1
	; ...

			irq	equ	ebp+8
			dev_id	equ	ebp+12
			regs	equ	ebp+16

		leave
		ret


	; Define device operations

	; Reading from device - return a sequence of 1Eh bytes of the specified length.
	; This device is an infinite source, just like /dev/zero
	czytanie:
		push	ebp
		mov	ebp, esp

		; argument placement on the stack:
		s_file	equ	ebp+8	; pointer to a file structure
		bufor	equ	ebp+12	; data buffer address
		l_jedn	equ	ebp+16	; requested number of bytes
		loff	equ	ebp+20	; requested start position of reading

		pushfd
		push	edi
		push	ecx

		mov	ecx, [l_jedn]
		mov	al, 0x1e
		cld
		mov	edi, [bufor]
		rep	stosb		; fill buffer with 1Eh bytes

		pop	ecx
		pop	edi
		popfd

		mov	eax, [l_jedn]	; return the number of requested bytes

		leave
		ret

	; writing to the device - infinite well (will consume anything)
	zapis:
		push	ebp
		mov	ebp, esp

		; don't write physically anything, just return the number
		;	of bytes we were supposed to write.
		mov	eax, [l_jedn]

		leave
		ret

	; seek
	przejscie:
	; close:
	zamykanie:
	; open:
	otwieranie:
		xor	eax, eax	; all 3 functions always return success
		ret



	section ".data" writeable

	major	dd	0	; kernel-assigned device major number

	; addresses of the functions for device operations
	file_oper:	dd 0, przejscie, czytanie, zapis, 0, 0, 0, 0, otwieranie, 0
			dd zamykanie, 0, 0, 0, 0, 0

	dev_err	db	"<1>Device register error: %d.", 10, 0
	irq_err	db	"<1>IRQ assignment error: %d.", 10, 0
	porty_err	db	"<1>Port assignment error:  EAX=%d", 10, 0
	ram_err	db	"<1>Memory assignment error: EAX=%d", 10, 0


	uruch		db	"<1>Module loaded. Maj=%d, IRQ=%d", 10, 0
	usun		db	"<1>Module removed.", 10, 0

	nazwa		db	"test00", 0
	sciezka		db	"/dev/test00", 0

	section ".modinfo"
	__module_kernel_version	db	"kernel_version=2.4.26", 0
	__module_license	db	"license=GPL", 0
	__module_author		db	"author=Bogdan D.", 0
	__module_description	db	"description=Example kernel module", 0
	__module_device		db	"device=test00", 0

The above module, after assembling, is easiest installed using the following script:


(skip install script)
	#!/bin/bash

	PLIK="modul_dev_fasm.o"		# Put your module's name here
	NAZWA="test00"			# Device's name

	# Inserting the module.
	/sbin/insmod $PLIK $* || { echo "insmod problem!" ; exit -1; }

	# finding and printing our module name
	/sbin/lsmod | grep `echo $PLIK | sed 's/[^a-z]/ /g' | awk '{print $1}' `
	# print resource information
	grep $NAZWA /proc/devices
	grep $NAZWA /proc/ioports
	grep $NAZWA /proc/iomem
	grep $NAZWA /proc/interrupts

	# find and print device major number
	NR=`grep $NAZWA /proc/devices | awk '{print $1}'`
	echo "Major = $NR"

	# remove old device file
	rm -f /dev/$NAZWA

	# creating the device file in /dev
	# sys_mknod from inside the module does NOT work
	mknod /dev/$NAZWA c $NR 0
	ls -l /dev/$NAZWA

	# short test: read 512 bytes and check their contents
	dd count=1 if=/dev/$NAZWA of=/x && hexdump /x && rm -f /x

All you have to do is save this script under some name, like instal.sh, allow it to be executed using the command chmod u+x instal.sh and run it using ./instal.sh, as root, of course. If the module is successfully loaded, the script will display the resources assigned to the module - I/O ports, IRQ and memory - by reading the necessary files in the /proc directory. The script will also create the device file in the /dev directory, with the correct major number. After that, a short test will be performed.

You can easily uninstall the module using the script:

	#!/bin/bash

	PLIK="modul_dev_fasm"	# Enter you module name here, without the .o
	NAZWA="test00"		# Device's name

	/sbin/rmmod $PLIK && rm -f /dev/$NAZWA

The simplest 2.6-series kernel module


(skip the simplest 2.6 kernel module)

The simplest 2.6 kernel module looks like this:


(skip the simplest 2.6 kernel module code)
	format ELF
	section ".init.text" executable	align 1
	section ".text" executable align 4

	public init_module
	public cleanup_module

	extrn printk

	init_module:
		push	dword str1
		call	printk
		pop	eax
		xor	eax, eax
		ret

	cleanup_module:
		push	dword str2
		call	printk
		pop	eax
		ret

	section ".modinfo" align 32
	__kernel_version	db	"kernel_version=2.6.16", 0
	__mod_vermagic db "vermagic=2.6.16 686 REGPARM 4KSTACKS gcc-4.0", 0
	__module_license	db	"license=GPL", 0
	__module_author		db	"author=Bogdan D.", 0
	__module_description	db	"description=First 2.6 kernel module", 0

	section "__versions" align 32
		dd	0xfa02c634
	n1:	db	"struct_module"
		times	64-4-($-n1) db 0

		dd	0x1b7d4074
	n2:	db	"printk"
		times	64-4-($-n2) db 0

	section ".data" writeable align 4

	str1		db	"<1> Inside init_module(). ", 10, 0
	str2		db	"<1> Inside cleanup_module(). ", 10, 0

	section ".gnu.linkonce.this_module" writeable align 128

	align 128
	__this_module:		; total length: 512 bytes
				dd 0, 0, 0

			.nazwa:	db "modul", 0
				times 64-4-($-.nazwa) db 0

				times 100 db 0
				dd init_module
				times 220 db 0
				dd cleanup_module
				times 112 db 0

You can surely see many differences, right? We'll discuss them section by section now:

  1. .init.text

    In general, there should be at least two: .init.text, containing the initialization procedure and .exit.text, containing the exit procedure.

    Additionally, you can of course have a data section .data and a code section .text.

    If during installing the module, you get Accessing a corrupted shared library messages, you should do some shuffling wit the sections - add a .text, remove .init.text, change the order etc.

  2. .gnu.linkonce.this_module

    This is the most important one. Without this section, each attempt to install the module will result in a No module found in object message. Contents of this section is a structure named __this_module of type module. The best you can do right now is to copy the above example one to your modules, changing the module name (between the quotation marks) and the entry and exit points' names.

    You can also use the following macro:

    	macro	gen_this_module		name*, entry, exit
    	{
    		section '.gnu.linkonce.this_module' writeable align 128
    
    		align 128
    		__this_module:
    				dd 0, 0, 0
    	   	.mod_nazwa:	db name, 0
    				times 64-4-($-.mod_nazwa) db 0
    				times 100 db 0
    				if entry eq
    					dd init_module
    				else
    					dd entry
    				end if
    				times 220 db 0
    				if exit eq
    					dd cleanup_module
    				else
    					dd exit
    				end if
    				times 112 db 0
    
    	}

    Using this macro is very easy: just pass it the name of the module, which should be displayed with the lsmod command and the names (addresses) of the entry and exit procedures, for example

    	gen_this_module	"your_module", init_module, cleanup_module

    This macro call should be placed where the section should be, for example - after the last declaration in the data section. In any case NOT inside any section.

  3. modinfo

    This section, compared to the one in the 2.4 kernel module, has only one, but very important new entry - vermagic. In your kernel this string will probably differ from mine only in the kernel version. Original string looks like this:


    (skip vermagic)
    	#define VERMAGIC_STRING 				\
    	  UTS_RELEASE " "					\
    	  MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT 		\
    	  MODULE_ARCH_VERMAGIC 					\
    	  "gcc-" __stringify(__GNUC__) "." __stringify(__GNUC_MINOR__)
    	#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY \
     		 MODULE_REGPARM MODULE_STACKSIZE

    and you can find it in the asm* subdirectories of the INCLUDE directory in the kernel source tree and in the VERMAGIC.H file.

  4. __versions

    This section contains information about versions of the procedures which this module uses. Structure of this section is fairly easy: first put a DWORD with the number matching the given kernel function, which can be found in the MODULE.SYMVERS in the kernel source main directory. Right after the number goes the name of the used function, filled with zeros up to 64 bytes.

    This section is not required for the module to operate correctly, but should be in every module or kernel tainted messages will appear.

    You can generate this whole section using my script symvers-fasm.txt. All you have to do is run perl symvers-fasm.pl your_module.asm.


Reserving resources in 2.6-series kernels


(skip reserving resources in 2.6 kernels)

Reserving resources in 2.6 kernels from outside view (the perspective of the C language) isn't much different from the one in 2.4 kernels. But inside, two major changes have been made:

  1. The file_operations structure

    In 2.6 kernels it looks like this:


    (skip file_operations 2.6 kernel structure)
    	struct file_operations {
    		struct module *owner;
    		loff_t (*llseek) (struct file *, loff_t, int);
    		ssize_t (*read) (struct file*,char __user*,size_t,
    			loff_t*);
    		ssize_t (*aio_read) (struct kiocb *, char __user *,
    			size_t, loff_t);
    		ssize_t (*write) (struct file *, const char __user *,
    			size_t, loff_t *);
    		ssize_t (*aio_write) (struct kiocb *, const char __user*,
    			size_t, loff_t);
    		int (*readdir) (struct file *, void *, filldir_t);
    		unsigned int (*poll) (struct file *,
    			struct poll_table_struct *);
    		int (*ioctl) (struct inode *, struct file *,
    			unsigned int, unsigned long);
    		long (*unlocked_ioctl) (struct file *, unsigned int,
    			unsigned long);
    		long (*compat_ioctl) (struct file *, unsigned int,
    			unsigned long);
    		int (*mmap) (struct file *, struct vm_area_struct *);
    		int (*open) (struct inode *, struct file *);
    		int (*flush) (struct file *);
    		int (*release) (struct inode *, struct file *);
    		int (*fsync) (struct file *, struct dentry *,
    			int datasync);
    		int (*aio_fsync) (struct kiocb *, int datasync);
    		int (*fasync) (int, struct file *, int);
    		int (*lock) (struct file *, int, struct file_lock *);
    		ssize_t (*readv) (struct file *, const struct iovec *,
    			unsigned long, loff_t *);
    		ssize_t (*writev) (struct file *, const struct iovec *,
    			unsigned long, loff_t *);
    		ssize_t (*sendfile) (struct file *, loff_t *, size_t,
    			read_actor_t, void *);
    		ssize_t (*sendpage) (struct file *, struct page *, int,
    			size_t, loff_t *, int);
    		unsigned long (*get_unmapped_area)(struct file *,
    			unsigned long, unsigned long, unsigned long,
    			unsigned long);
    		int (*check_flags)(int);
    		int (*dir_notify)(struct file *filp, unsigned long arg);
    		int (*flock) (struct file *, int, struct file_lock *);
    	};
  2. Parameter passing

    My distribution kernel was compiled in such a way that 3 first parameters to each procedure except printk are passed in registers: EAX, EDX, ECX, and the rest on the stack. To check if your kernel does and expects the same, use the commands

    	grep -R regpar /lib/modules/`uname -r`/build/|grep Makefile
     	grep -R REGPAR /lib/modules/`uname -r`/build/|grep config

    If the results contain something similar to:

     	CONFIG_REGPARM=y
     	#define CONFIG_REGPARM 1

    then your kernel is probably compiled like mine. If so, you can use the below macro URUCHOM to call kernel functions. If not, you can modify the macro. If your system hangs when you try to load the module, you probably need to modify the macro.


Example of a 2.6-series kernel module


(skip example 2.6 kernel module)

Just like the 2.4 example, the module below will register a software character device (a device which has no hardware, like /dev/null) with IRQ 4, port range 600h-6FFh, memory range 80000000h - 8000FFFFh and basic operations: opening, closing, reading, writing, seeking. For simplicity, this code doesn't check if the resources are occupied. If they are, the kernel will return an error and the module won't load.

	format ELF
	section ".text" executable align 4

	public	init_module
	public	cleanup_module

	extrn	printk
	extrn	register_chrdev
	extrn	unregister_chrdev
	extrn	request_irq
	extrn	free_irq

	extrn	__request_region
	extrn	__release_region
	extrn	ioport_resource
	extrn	iomem_resource

	PORTY_START	= 0x600
	PORTY_ILE	= 0x100

	RAM_START	= 0x80000000
	RAM_ILE		= 0x00010000

	SA_INTERRUPT	= 0x20000000
	NUMER_IRQ	= 4

	macro	uruchom		funkcja, par1, par2, par3, par4, par5
	{
		if ~ par5 eq
			push	dword par5
		end if
		if ~ par4 eq
			push	dword par4
		end if
		if ~ par3 eq
			mov	ecx, par3
		end if
		if ~ par2 eq
			mov	edx, par2
		end if
		if ~ par1 eq
			mov	eax, par1
		end if
		call	funkcja
		if ~ par5 eq
			add	esp, 4
		end if
		if ~ par4 eq
			add	esp, 4
		end if
	}

	init_module:
		pushfd

		; registering character device:
		uruchom	register_chrdev, 0, nazwa, file_oper

		cmp	eax, 0
		jg	.dev_ok

		; print error
		push	eax
		push	dword dev_err
		call	printk
		add	esp, 1*4		; removing only 1*4 bytes, because:

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.dev_ok:

		mov	[major], eax

		; reserve I/O ports
	uruchom __request_region, ioport_resource, PORTY_START, PORTY_ILE, nazwa

		test	eax, eax
		jnz	.iop_ok

		push	eax
		push	dword porty_err
		call	printk
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		uruchom	unregister_chrdev, [major], nazwa

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.iop_ok:

		; reserve memory
		uruchom	__request_region, iomem_resource, RAM_START, RAM_ILE, nazwa

		test	eax, eax
		jnz	.iomem_ok

		push	eax
		push	dword ram_err
		call	printk
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		uruchom	unregister_chrdev, [major], nazwa

		; free reserved ports
		uruchom	__release_region, ioport_resource, PORTY_START, PORTY_ILE

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.iomem_ok:

		; assigning the IRQ:
	uruchom request_irq, NUMER_IRQ, obsluga_irq, SA_INTERRUPT, nazwa, file_oper

		cmp	eax, 0
		jge	.irq_ok

		push	eax
		push	dword irq_err
		call	printk
		add	esp, 1*4		; will 'pop eax' later

		; unregister the device
		uruchom	unregister_chrdev, [major], nazwa

		; free reserved ports
		uruchom	__release_region, ioport_resource, PORTY_START, PORTY_ILE

		; free reserved memory
		uruchom	__release_region, iomem_resource, RAM_START, RAM_ILE

		pop	eax			; will exit with the error code in EAX
		jmp	.koniec

	.irq_ok:

		; print info about successful module load
		push	dword NUMER_IRQ
		push	dword [major]
		push	dword uruch
		call	printk
		add	esp, 3*4

		xor	eax, eax

	.koniec:

		popfd
		ret

	; called when the module is unloaded
	cleanup_module:
		pushfd
		push	eax

		; free the IRQ:
		uruchom	free_irq, NUMER_IRQ, file_oper

		; unregister the device:
		uruchom	unregister_chrdev, [major], nazwa

		; free reserved ports
		uruchom	__release_region, ioport_resource, PORTY_START, PORTY_ILE

		; free reserved memory
		uruchom	__release_region, iomem_resource, RAM_START, RAM_ILE

		push	dword usun
		call	printk
		add	esp, 1*4

		pop	eax
		popfd
		ret

	; Out interrupt service function. This one does nothing, but argument placement
	;	on the stack is shown
	; void handler (int irq, void *dev_id, struct pt_regs *regs);

	section ".text" executable align 4

	obsluga_irq:
		push	ebp
		mov	ebp, esp

	; [ebp] = old EBP
	; [ebp+4] = return address
	; [ebp+8] = arg1
	; ...

			irq	equ	ebp+8
			dev_id	equ	ebp+12
			regs	equ	ebp+16

		; your code here

		leave
		ret

	; Define device operations

	; Reading from device - return a sequence of 1Eh bytes of the specified length.
	; This device is an infinite source, just like /dev/zero
	czytanie:
	;	ssize_t (*read) (struct file *, char *, size_t, loff_t *);
		push	ebp
		mov	ebp, esp

		; argument placement on the stack (3 params in registers):
		loff	equ	ebp+8

		pushfd
		push	edi
		push	ecx

		mov	al, 0x1e
		cld
		mov	edi, edx
		rep	stosb

		pop	ecx
		pop	edi
		popfd

		; as many as requested was read
		mov	eax, ecx

		leave
		ret

	zapis:
	;	ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
		push	ebp
		mov	ebp, esp

		; don't write physically anything, just return the number
		;	of bytes we were supposed to write (third parameter).
		mov	eax, ecx

		leave
		ret

	; seek
	przejscie:
	; close
	zamykanie:
	; open
	otwieranie:
		xor	eax, eax	; all 3 functions always return success
		ret



	section ".data" writeable align 4

	major	dd	0	; kernel-assigned device major number

	; addresses of the functions for device operations
	file_oper:	dd 0, przejscie, czytanie, 0, zapis, 0, 0, 0, 0, 0, 0, 0
			dd otwieranie, 0, zamykanie, 0, 0, 0, 0, 0, 0, 0, 0, 0
			dd 0, 0, 0
			dd 0, 0, 0

	dev_err	db	"<1>Device register error: %d.", 10, 0
	irq_err	db	"<1>IRQ assignment error: %d.", 10, 0
	porty_err	db	"<1>Port assignment error:  EAX=%d", 10, 0
	ram_err	db	"<1>Memory assignment error: EAX=%d", 10, 0


	uruch		db	"<1>Module loaded. Maj=%d, IRQ=%d", 10, 0
	usun		db	"<1>Module removed.", 10, 0

	nazwa		db	"test00", 0, 0
	sciezka		db	"/dev/test00", 0

	section ".modinfo" align 32
	__kernel_version	db	"kernel_version=2.6.16", 0
	__mod_vermagic db "vermagic=2.6.16 686 REGPARM 4KSTACKS gcc-4.0",0
	__module_license	db	"license=GPL", 0
	__module_author		db	"author=Bogdan D.", 0
	__module_description	db	"description=Example 2.6 kernel module", 0
	__module_device		db	"device=test00", 0
	__module_depends	db	"depends=", 0

	; irrelevant, taken from a compiled C module:
	__mod_srcversion	db	"srcversion=F5CE0CFFE0191EDB2F816D4", 0

	section "__versions" align 32

	____versions:
		dd	0xfa02c634		; from MODULE.SYMVERS
	n1:	db	"struct_module", 0
		times	64-4-($-n1) db 0

		dd	0x1b7d4074
	n2:	db	"printk", 0
		times	64-4-($-n2) db 0

		dd	0xb5145e00
	n3:	db	"register_chrdev", 0
		times	64-4-($-n3) db 0

		dd	0xc192d491
	n4:	db	"unregister_chrdev", 0
		times	64-4-($-n4) db 0

		dd	0x26e96637
	n5:	db	"request_irq", 0
		times	64-4-($-n5) db 0

		dd	0xf20dabd8
	n6:	db	"free_irq", 0
		times	64-4-($-n6) db 0

		dd	0x1a1a4f09
	n7:	db	"__request_region", 0
		times	64-4-($-n7) db 0

		dd	0xd49501d4
	n8:	db	"__release_region", 0
		times	64-4-($-n8) db 0

		dd	0x865ebccd
	n9:	db	"ioport_resource", 0
		times	64-4-($-n9) db 0

		dd	0x9efed5af
	n10:	db	"iomem_resource", 0
		times	64-4-($-n10) db 0


	section ".gnu.linkonce.this_module" writeable align 128

	align 128
	__this_module:		; total length: 512 bytes
				dd 0, 0, 0
		.mod_nazwa:	db "modul_dev_fasm", 0
				times 64-4-($-.mod_nazwa) db 0
				times 100 db 0
				dd init_module
				times 220 db 0
				dd cleanup_module
				times 112 db 0

To install and remove the module from the kernel you can use the same scripts as for 2.4 kernel.


Other kernels and other tricks

In later versions of the kernel the general way of writing modules stayed the same. The Linux kernel, like all big programs, is being developed and it changes over time. Such changes impact, among other:

You can peek how the modules are compiled by running make (with the right parameters) with the flag V=1, e.g. make O=build/ V=1 modules (to build all configured modules).

The location of the file with function versions (or check-sums) can vary depending on the kernel version and on the given Linux distribution - it can be a file called Module.symvers somewhere in the /lib/modules/kernel_version/ directory, it can be the file symvers-kernel_version (perhaps compressed) in the /boot/ directory.

The contents of the module structure should be in the module.h file in the directory containing the uncompressed kernel source in the right version: linux-X.Y.Z/include/linux/. The file /usr/include/linux/module.h can be inappropriate for this purpose.

Other interesting files, depending on the kernel version:

and servicing the modules is taken care by e.g. linux-X.Y.Z/kernel/module.c and linux-X.Y.Z/arch/x86/module.c.

The file_operations structure can be found in linux-X.Y.Z/include/linux/fs.h, and in the 5.5.12 kernel it looks like this:
(skip the new file_operations structure)

	struct file_operations {
		struct module *owner;
		loff_t (*llseek) (struct file *, loff_t, int);
		ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
		ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
		ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
		ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
		int (*iopoll)(struct kiocb *kiocb, bool spin);
		int (*iterate) (struct file *, struct dir_context *);
		int (*iterate_shared) (struct file *, struct dir_context *);
		__poll_t (*poll) (struct file *, struct poll_table_struct *);
		long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
		long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
		int (*mmap) (struct file *, struct vm_area_struct *);
		unsigned long mmap_supported_flags;
		int (*open) (struct inode *, struct file *);
		int (*flush) (struct file *, fl_owner_t id);
		int (*release) (struct inode *, struct file *);
		int (*fsync) (struct file *, loff_t, loff_t, int datasync);
		int (*fasync) (int, struct file *, int);
		int (*lock) (struct file *, int, struct file_lock *);
		ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
		unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
		int (*check_flags)(int);
		int (*flock) (struct file *, int, struct file_lock *);
		ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
		ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
		int (*setlease)(struct file *, long, struct file_lock **, void **);
		long (*fallocate)(struct file *file, int mode, loff_t offset,
				loff_t len);
		void (*show_fdinfo)(struct seq_file *m, struct file *f);
	#ifndef CONFIG_MMU
		unsigned (*mmap_capabilities)(struct file *);
	#endif
		ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
				loff_t, size_t, unsigned int);
		loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
					struct file *file_out, loff_t pos_out,
					loff_t len, unsigned int remap_flags);
		int (*fadvise)(struct file *, loff_t, loff_t, int);
	}

Instead of the function register_chrdev we should probably use register_chrdev_region, instead of request_irq - probably pci_request_irq (a symbol with this name is on the list of symbols marked as exported by vmlinux, meaning the kernel itself).

When you have kernel code configured to the target kernel by running make XXXconfig, you can use the below .modinfo section generator (if you're compiling for your current system, you just need to copy the configuration file, e.g. /boot/config-kernel_version to the file .config in the kernel source directory, and run make oldconfig).
(skip the generator)

	#include <stdio.h>
	#include <stddef.h>
	#include <linux/module.h>

	#define MODULE_NAME "module1"

	static void disp_common (const char mname[], const char pointer_type[])
	{
		struct module m;

		puts ("align 128");
		puts ("__this_module:");
		printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, name));
		printf ("\t.mod_name:\tdb '%s', 0\n", mname);
		printf ("\t\t\ttimes %d - ($ - .mod_name) db 0\n", sizeof (m.name));
		printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, init) - offsetof (struct module, name) - sizeof (m.name));
		printf ("\t.mod_init:\t%s init_module\n", pointer_type);
		printf ("\t\t\ttimes %d db 0\n", offsetof (struct module, exit) - offsetof (struct module, init) - sizeof (m.init));
		printf ("\t.mod_exit:\t%s cleanup_module\n", pointer_type);
		printf ("\t\t\ttimes %d db 0\n", sizeof (struct module) - offsetof (struct module, exit) - sizeof (m.exit));
		puts ("--------------------------------");
	}

	static void disp_nasm (const char mname[], const char pointer_type[])
	{
		puts ("--------------------------------\nsection .gnu.linkonce.this_module");
		disp_common (mname, pointer_type);
	}

	static void disp_fasm (const char mname[], const char pointer_type[])
	{
		puts ("--------------------------------\nsection '.gnu.linkonce.this_module' writeable align 128");
		disp_common (mname, pointer_type);
	}

	int main (void)
	{
		puts ("NASM, 32-bit:");
		disp_nasm (MODULE_NAME, "dd");
		puts ("NASM, 64-bit:");
		disp_nasm (MODULE_NAME, "dq");

		puts ("FASM, 32-bit:");
		disp_fasm (MODULE_NAME, "dd");
		puts ("FASM, 64-bit:");
		disp_fasm (MODULE_NAME, "dq");

		return 0;
	}

You can compile it using the following script:
(skip the compiling script)

	#!/bin/bash

	lpath=/path/to/linux-X.Y.Z
	gcc 	\
		-I /usr/include \
		-I $lpath/arch/x86/include \
		-I $lpath/arch/x86/include/generated \
		-I $lpath/arch/x86/include/uapi \
		-I $lpath/arch/x86/include/generated/uapi \
		-I $lpath/include \
		-I $lpath/include/uapi \
		-I $lpath/include/generated \
		-I $lpath/include/generated/uapi \
		-I $lpath/build/include \
		-I $lpath/build/arch/x86/include \
		-I $lpath/build/arch/x86/include/generated \
		-I $lpath/build/arch/x86/include/uapi \
		-I $lpath/build/arch/x86/include/generated/uapi \
		-include $lpath/include/linux/kconfig.h	\
		-D__KERNEL__ \
		-DMODULE	\
		-o gen-modul-info	\
		gen-modul-info.c

putting your path to the unpacked kernel sources.

After running the program, it will display the content which you, perhaps after enhancing (e.g. with other fields if the structure, because the script only sets the name of the module and the addresses of the initialization and cleanup functions), should place in the .modinfo section:
(skip an example result)

	section .gnu.linkonce.this_module
	align 128
	__this_module:
				times 24 db 0
		.mod_name:	db 'module1', 0
				times 64 - ($ - .mod_name) db 0
				times 296 db 0
		.mod_init:	dq init_module
				times 432 db 0
		.mod_exit:	dq cleanup_module
				times 72 db 0

The kernel is written in the C language, so programmers using this language have the comfort of not being forced to copy and adjust the structures to their code, because they have them in the header files already. Similarly, to initialize the structures they just need to initialize the specific fields and the compiler will insert the right values in the right places - no need to count after how many bytes should one put the next field. This becomes more and more difficult for other programmers.

Because of this, you can consider writing the facade part of your module (the part with the declaration of the initialization and cleanup functions, with the .modinfo sections and all the structures) in the C language, and the module functionality - in assembly and link the parts together using the rules of the C calling convention, found in many places on the Internet, e.g. look for x64-abi-0.96.pdf.

Another problem may also be the architecture: 32- or 64-bit, because the register names and the way parameters are passed are different.

In the case of FASM, where the output file type is put in the source file, you need to write separate versions for 32- and 64-bit systems.

In the case of NASM, things can be a bit easier, because you can check the output file type (passed on the command line) in the code and modify the register or instruction names accordingly. You can use macros like the following:
(skip architecture macros)

	%ifidn __OUTPUT_FORMAT__, elf64
		bits 64
		%define	ARCH		'x64'
		%define	RET_REG		rax
		%define ptr_type	dq
		%define ptr_size	8
		%define	pushflags	pushfq
		%define	popflags	popfq
	%else
		bits 32
		%define ARCH		'x86'
		%define	RET_REG		eax
		%define ptr_type	dd
		%define ptr_size	4
		%define	pushflags	pushfd
		%define	popflags	popfd
	%endif
	%macro	call_fnc		1-7 ; function, par1, par2, par3, par4, par5, par6

		%if ARCH = 'x64'
			%ifnempty %7
				mov	r9, %7
			%endif
			%ifnempty %6
				mov	r8, %6
			%endif
			%ifnempty %5
				mov	r10, %5
			%endif
			%ifnempty %4
				mov	rdx, %4
			%endif
			%ifnempty %3
				mov	rsi, %3
			%endif
			%ifnempty %2
				mov	rdi, %2
			%endif
			call	%1
		%else
			%ifnempty %7
				push	dword %7
			%endif
			%ifnempty %6
				push	dword %6
			%endif
			%ifnempty %5
				push	dword %5
			%endif
			%ifnempty %4
				mov	ecx, %4
			%endif
			%ifnempty %3
				mov	edx, %3
			%endif
			%ifnempty %2
				mov	eax, %2
			%endif
			call	%1
			%ifnempty %7
				add	esp, 4
			%endif
			%ifnempty %6
				add	esp, 4
			%endif
			%ifnempty %5
				add	esp, 4
			%endif
		%endif

	%endmacro

and then use them in the code:

	init_module:
		call_fnc	printk, running_msg

		xor		RET_REG, RET_REG
		ret

This should make your job easier and reduce code duplication among many files.

If you want to seriously start writing modules, you can start by reading the documentation about how to do everything properly and what functionalities and mechanisms are offered by the kernel:


On-line contents (access key 2)
Helpers for people with disabilities (access key 0)