Notes of Kexec
(Photo of Snow, taken in St Gallen)
Introduction
Last summer, as part of my OSPP project, I worked on improving kexec support for RISC-V. Through this, I gained insights into how kexec operates at a system level. In this post, I’ll share some notes on its functionality and implementation.
What is kexec and How is it Used?
According to the Linux man page, “kexec is a system call that enables you to load and boot into another kernel from the currently running kernel.” While this definition is straightforward, it provides little insight into its practical use cases. Additionally, the documentation can be confusing because it does not clearly differentiate between the kexec system call and kexec-tools
, the user-space utility that facilitates interaction with kexec.
In my OSPP project, my work focused on kexec-tools
. However, before discussing its details, let’s first explore what kexec can do at the kernel level without user-space tools.
The manual for the kexec system call describes two variants: kexec_load
and kexec_file_load
. As their names suggest, both are responsible for loading a new kernel into memory, but they differ in how they accept input. The actual execution of the new kernel happens when the reboot
system call is invoked with LINUX_REBOOT_CMD_KEXEC
or, in rarer cases, when a system crash occurs.
Use Cases for kexec
-
Fast Reboots: When rebooting into a new kernel via kexec, the system does not perform a full hardware reset. Instead, the currently running kernel simply jumps to a memory address, similar to a function call, except the new kernel never returns to the old one. This skips early-stage hardware initialization and significantly speeds up the reboot process. It also allows users to test a new kernel without modifying the bootloader or EFI settings.
-
Crash Recovery (kexec-based Crash Dumps): In the event of a kernel crash, a secondary “crash kernel” can be preloaded into memory. When the main kernel fails, the system transitions into this backup kernel instead of performing a hard reset. The crash kernel is then responsible for capturing system memory and generating crash dumps for post-mortem debugging.
How is kexec Implemented?
The final step of kexec is a jump to the new kernel, but several technical details must be addressed beforehand:
- Where should the new kernel image be placed in memory?
- What preparatory steps should the kernel take before jumping to the new kernel?
- What parameters should be passed to the new kernel?
- What image formats are supported?
1. Memory Placement of the New Kernel
The memory location of the new kernel depends on which kexec system call is used:
kexec_file_load
: The kernel handles memory allocation automatically, ensuring the new kernel is placed at an appropriate location.kexec_load
: The user must specify memory addresses manually. This requires finding a suitable free region of memory for the new kernel’s code, data, and entry point. While users may not have access to real-time memory allocation maps, this is not a problem because the new kernel’s memory layout only becomes effective after all user-space processes have terminated.
A special case arises when using kexec for crash recovery. The crash kernel must be loaded into a pre-reserved memory region; otherwise, it risks overwriting important memory contents, rendering debugging information inaccessible. If this reserved region is not set up correctly, the crash kernel functionality will not work.
2. Kernel Preparations Before Jumping
Most modern kernels enable the Memory Management Unit (MMU) shortly after boot. This means that memory accesses rely on virtual addresses. However, booting a new kernel requires working with physical addresses since the new kernel expects to initialize its own MMU. Consequently, the currently running kernel must disable the MMU before executing the new kernel. This also explains why kexec_load
requires physical addresses rather than virtual addresses.
3. Parameters Passed to the New Kernel
The information passed to the new kernel depends on the system’s booting conventions. On RISC-V, the following parameters are commonly required:
- Device Tree Blob (DTB): Provides hardware descriptions such as available memory, CPU cores, and peripherals.
- Kernel Command Line: Supplies boot-time configuration options.
- Initrd Image (if needed): A temporary root filesystem, often used for decrypting storage or loading necessary drivers before the main filesystem is available.
4. Supported Image Formats
Supported kernel image formats vary based on implementation:
- A raw binary image (where execution starts at the beginning of the file) is almost always supported.
- More complex formats like compressed images, ELF executables, or even PE files (for UEFI booting) may also be supported, but typically not at the kernel level. Instead,
kexec-tools
handles format conversion, extraction, and decompression, ensuring that the kernel receives an appropriate payload.
Conclusion
kexec is a powerful mechanism that allows fast reboots and crash recovery without a full system reset. While it requires careful memory management and preparatory steps, tools like kexec-tools
simplify the process for users. Understanding how kexec operates at the kernel level is crucial for debugging and extending its functionality, especially for new architectures like RISC-V.