Add executables to Linux on FPGA

April 10, 2025

(Photo of Zurich Lake)

Background

After successfully porting XiangShan to the U55C FPGA board, a simple way to run a single test program is to bundle it into an initrd. This initrd is packed into the Linux kernel image and used as the root filesystem when no other storage devices are available. If you want to avoid dealing with Linux’s initialization configs, you can even name your executable /init—Linux will run it automatically on boot. Once the program finishes, it exits, and the kernel panics because the init process has terminated. The only requirement is that your executable must be statically linked; otherwise, you’d need to include its dependent dynamic libraries in the initrd.

However, this setup only supports running one program at a time. To keep the Linux system running and support multiple test programs, you’ll need an interactive shell and a way to import executables into the running system.

Some FPGA boards offer external storage ports where you can attach a flash drive and run a full Linux distribution. Unfortunately, the U55C board lacks any readily usable external interface. The only available communication paths are the main memory and the UART port. In our experience, UART is both slow and unreliable, so main memory becomes the only practical option for transferring data.

Concept and Implementation

Using main memory for communication is conceptually simple: choose a fixed physical address, write the file size and contents to that location from the host, and use a RISC-V Linux program to read the data and copy it into the filesystem (i.e., initrd).

Challenges

Several practical issues make this approach more complex than it appears:

Access to Physical Memory: User-space programs typically can’t access physical memory directly.
Data Coherency: Files are written from the host to the FPGA’s HBM via Xilinx’s DMA IP. XiangShan reads from HBM but has its own independent cache. Without proper handling, cached data may be stale.
Safe Memory Allocation: You can’t arbitrarily choose a memory address for communication—some regions are used by the SBI firmware, Linux kernel, or user-space programs. Overwriting them could lead to unpredictable crashes.

Solutions

Fortunately, all of these problems can be addressed:

Accessing Physical Memory: Since we’re running as root, we can use /dev/mem to access physical memory. Using mmap, we map the target memory region into our program’s address space and read from it.
Ensuring Data Coherency: XiangShan has a dma AXI port that can safely receive external write requests and forward them to memory, ensuring coherency. However, this port is only available when the XiangShan core is running. Using it during the boot process complicates the system design, as we’d need to coordinate core execution and image loading. Instead, we handle coherency in software by flushing the cache before reading from the memory. On Kunminghu v1 (used here), cache management instructions aren’t implemented, so we force eviction by accessing a large array. It’s a crude but functional workaround.
Avoiding Memory Conflicts: We use a memory region outside of what’s described in the device tree. There was initial concern that Linux might not map this region into its virtual address space, but /dev/mem lets us access it without issues.

Implementation

The flow is as follows:

From the host, upload your ELF file to a fixed physical memory address. Prepend its size to the beginning of the memory block.
Include a small loader program (memloader.elf) in your initrd. This program reads from the specified memory region and writes the file to /test.elf.
Use busybox, a lightweight tool that can serve both as the init process and an interactive shell. We package it as /init and configure it to start a shell after boot.
From the shell, run memloader.elf to extract the uploaded file and then execute /test.elf.

Better Approaches

The current cache-flushing method—accessing a large dummy array—is a bit hacky. There are more elegant alternatives:

Using PMA (Physical Memory Attributes): XiangShan supports PMA settings that can mark memory regions as uncacheable. However, this requires modifying the hardware design and recompiling the FPGA bitstream, which we wanted to avoid.
Using Cache Instructions: Newer versions of Kunminghu support cache management instructions. These allow you to explicitly flush or invalidate cached regions—providing a much cleaner solution.