Port XiangShan to FPGA
(Photo of Snow, taken in St Gallen)
I recently ported XiangShan to the Xilinx U55C FPGA. This article highlights the key steps that are missing from the currently available XiangShan documentation.
FPGA Introduction
For those familiar with ASIC or simulation designs but new to FPGAs (as I was a week ago), setting up an FPGA project can be confusing.
FPGA Basics
When designing Verilog/Chisel CPU modules, you typically assume the presence of external clock and reset/resetn signals. However, in an FPGA project, you must generate these signals yourself. In Xilinx FPGAs, these signals are controlled through XDC constraints, which map top-level module signals to FPGA pins. A sample XDC file is provided by Xilinx and can be found here. Additionally, board files offer essential clock configuration details.
Block Design
Manually connecting module wires can be tedious. Vivado simplifies this with Block Design
, allowing graphical connections between modules. Xilinx provides various IP cores—pre-packaged modules for use in Block Design. For this project, we utilize DMA, HBM, and AXI-related IPs. The complete layout and connections are shown below.
Reference Project
For beginners unfamiliar with these IPs, reference projects can be invaluable, particularly for complex PCIe and HBM configurations. I used a GitHub reference project that sets up DMA and HBM, serving as a foundation for our design. The included host-side code was difficult to understand, so I used this alternative tool for data transfer into HBM. You may also want to update your Xilinx DMA driver if you have a quite recent kernel on your FPGA host machine.
Project Design
Since we are building an FPGA-based XiangShan system, we need to understand its requirements and how to fulfill them within our project.
Choosing the Right Branch
XiangShan’s documentation is written for Nanhu
(second-generation). However, we used the kunminghu
branch (third-generation), which has undergone area optimization, making it more FPGA-friendly. The master
branch also offers Kunminghu but includes vector extensions, which increase resource usage unnecessarily. The kunminghu
branch is likely the last stable version without vector extensions.
UltraRAM Annotation
The official documentation advises modifying array_16_ext.v and array_22_ext.v to include UltraRAM directives. However, in kunminghu, file names may differ. Users should search for array_XX_ext.v files containing large memory blocks and modify them accordingly.
Handling Simulation Code
The generated source code includes simulation-specific sections wrapped in ifndef SYNTHESIS
macros. To prevent Vivado errors (due to unsupported DPI-C function calls), define the SYNTHESIS
macro for every affected file.
Connecting AXI Interfaces
XiangShan has three AXI ports:
mem
: Connects to a memory block (HBM in our case).peripheral
: Connects to peripherals (e.g., UART).dma
: Connects to a DMA device for coherence (unused in our design).
Reset Vector
XiangShan’s io_riscv_rst_vec_0
signal controls the reset vector address. In simulation, this is set to 0x10000000
(flash memory). However, since the flash memory only jumps to main memory, we simplify our design by setting the reset vector to 0x80000000
(HBM start address), eliminating the need for flash memory.
Interrupts and RTC
io_rtc_clock
: Drives the RTC, connected to the main clock (inaccurate but sufficient for our needs).io_extIntrs
: Handles external interrupts (unused in our minimal system, so set to zero).
Miscellaneous Signals
Additional ports like io_sram_config
, io_systemJtag_*
, and io_cacheable_check_*
are not well documented. By referencing XiangShan’s simulation environment, we connect all input signals to zero and ignore output signals. This causes JTAG-related issues when running OpenSBI, which we later fix from the software side.
System Design
Memory and AXI Connections
We need to connect clock, reset, and three AXI ports to our system:
- Memory AXI port → HBM
- Peripheral AXI port → UART (UART Lite in this case)
- DMA port → Not used
To load programs into memory, we use the PCIe interface. Xilinx’s XDMA IP connects to the PCIe port and exposes an AXI Master port, which communicates with HBM. Since both XiangShan and DMA need HBM access, we use an AXI interconnect to route requests from multiple sources to HBM.
Reset System
Proper reset sequencing is crucial. The CPU core should remain in reset until a valid program is loaded into HBM. Xilinx provides a Processor System Reset IP that generates structured reset signals. We use two instances:
- For the CPU core (controlled manually via Virtual IO from the host machine).
- For the remaining system components (driven by the XDMA AXI reset port).
The system clock is sourced from the XDMA IP, but since its default frequency (250MHz) is too high, we reduce it to 125MHz via reconfiguration.
Usage
Once the system is set up, we generate the bitstream, flash it to the FPGA via micro-USB, and load the program into HBM from the host machine:
sudo ./xdma_rw -w -d /dev/xdma0_h2c_0 -f ./linux.bin -s 7552420 -c 1 -a 0x80000000
Next, using Vivado’s Hardware Manager, we release the CPU reset via the Virtual IO control panel. The UART output can then be monitored on /dev/ttyUSBX
(where X
is the device number).
Software Debugging
Running Bare-Metal vs. Linux
- Bare-metal programs (e.g., those generated by nexus-am) run without issues.
- Linux kernels (with BBL or OpenSBI bootloader) get stuck due to JTAG module probing.
Fixing OpenSBI JTAG Issues
OpenSBI attempts to detect a JTAG module for semi-hosting console support. Since our system lacks JTAG, this request remains unhandled, causing a hang. We resolve this by removing the JTAG probe from OpenSBI.
With this modification, we successfully run Linux on XiangShan.