RISC-V Startup Code

Motivation

I work with different microcontroller boards day to day. My job is to write startup code in Rust for board bring-up, especially RISC-V cores and to write, generate, and maintain the Peripheral Access Crate (PAC) for each RISC-V SoC.

Working this low in the stack, I wanted to write down what actually makes a RISC-V core come alive and run our code. Concretely:

What is already true when the board powers on or resets?
What does the hardware hand us for free, and what must we set up ourselves?
How do we make the hardware do what we want, purely from software?
What has to be in place before we can call our first Rust function?

I’ll answer these using xv6, a small teaching OS that targets a 64-bit RISC-V core and runs on QEMU. Its boot path is the same bare-metal bring-up problem I deal with every day - set up a stack, place code and data, pick a privilege level, jump to our code - just unusually well documented, which makes it a great thing to learn from.

This post follows only the path from power-on to the main Rust function (main); everything the kernel does after that is for a later post.

I’m porting xv6 from C to Rust to learn real OS concepts by building one, reading the C source alongside the xv6 book. You can find my in-progress port on GitHub: xv6rs. Every code snippet in this post is taken from that repo.

RISC-V EEI

EEI stands for Execution Environment Interface, defined by the RISC-V Unprivileged spec.

A RISC-V core is defined to have its own Instruction Fetch Unit and each core might support multiple RISC-V-compatible hardware threads, or harts, through multithreading.
RISC-V defines up to three privilege levels: Machine (M), Supervisor (S), and User (U), most to least privileged. Only M is mandatory; S and U are optional. The xv6 target implements all three.
An EEI is the contract a layer offers to the software above it: it pins down the initial state, the harts, the memory and I/O regions, instruction behavior, and how traps and ecall are handled - to have defined behavior for the running software.
Crucially, the EEI is relative: it depends on which level you run at. For M-mode software the EEI is the hardware platform itself (which begins at power-on reset); for S-mode software, M-mode sets up the EEI; and so on up the stack.

See the RISC-V Unprivileged spec for the full definition.

Program’s Entry Point

xv6 operating system runs on RISC-V multiprocessor under QEMU. So QEMU sets up EEI for us, such as:

Loads kernel at address 0x80000000, where DRAM is present
Initial values of DRAM will be 0.
Starting privilege level will be set to M.
Hart numbers are incremental and start from 0.

With the combination of linker script and assembly, we will set our program to start from address 0x80000000. In theory we could have done this using Rust instead of assembly, but compiled Rust code, like C, assumes some things are already set up when any function is called like a valid stack pointer, a zeroed .bss and an initialized .data section. Here we are starting from garbage state w.r.t. QEMU, we will use assembly to set this up so that we could use Rust functions later.

As DRAM is filled with 0’s, we need not clear the .bss section and we need not move contents from Flash to RAM for the .data section as it will already be in DRAM. So the only thing missing is setting a valid stack pointer.

entry.rs:

use core::arch::global_asm;

global_asm!(
    "
    .section .text.entry
    .global _entry
    _entry:
            la sp, stack0
            csrr a1, mhartid
            addi a1, a1, 1
            slli a0, a1, 12       # `li a0, 4096`; `mul a0, a0, a1`;
            add sp, sp, a0
            call start
    ",
);

kernel.x:

OUTPUT_ARCH( "riscv" )
ENTRY( _entry )

SECTIONS
{
  . = 0x80000000;

  .text : {
    KEEP(*(.text.entry))
    *(.text .text.*)
  }

  .bss : {
  . = ALIGN(16);
  *(.bss.stack0)
  }
}

start.rs:

#[unsafe(no_mangle)]
#[unsafe(link_section = ".bss.stack0")]
static mut stack0: [u8; 4096 * NCPU] = [0; 4096 * NCPU]; // NCPU: No of CPU's

In kernel.x, we are setting start address to 0x80000000 and also we are keeping all symbols from .text.entry section at this address. As we only have _entry symbol in this section (see entry.rs), the first instruction of our program is la sp, stack0.

stack0 is declared in start.rs file, which is placed in .bss section. It should have 16-byte alignment according to RISC-V spec which is enforced by linker script.
Each hart will get 4 KiB of stack, we are using mhartid register (remember QEMU should set this as per EEI’s contract) to get the hart id.
Once the stack is set up, we call start. The C version follows this with a spin: j spin loop as a safety net; I leave it out because our start is marked -> ! and never returns.
C version uses mul instruction instead of shift instruction slli that we are using to calculate sp = stack0 + ((hartid + 1) * 4096), the reason for this change on the Rust side is a known limitation in how target features reach the assembler for hand-written asm. If we use mul a0, a0, a1, the assembler will throw the following error:

error: mul instruction requires the following: ‘Zmmul’ (Integer Multiplication)

Even though we are targeting riscv64gc-unknown-none-elf which has multiplication support, that feature isn’t automatically applied to the assembler for hand-written global_asm!, so this will not compile. We could solve this issue by using .option arch, +m, but I just showed you how we could avoid mul altogether by just using a base instruction (i.e., shift). You can find more details about this here and here.

S-Mode Kernel

xv6 kernel runs in S-mode whereas user programs run in U-mode. Now that we are in M-mode by default, we will use a function named start which we are calling at the end of assembly code to set up relevant things in M-mode and then we transition to S-mode.

The following things have to be set in xv6 during M-S mode transition:

Set Previous privilege mode to S, so that when we call mret, it transitions to S-mode.
Set Exception Program counter to our main function.
Disable paging (satp = 0, Bare mode) so that S-mode starts out using physical addresses. Paging applies to S-mode too once enabled - the kernel will run on virtual addresses through a page table we set up in a later post. (Unrestricted access to physical memory comes separately, from PMP)
Delegate all interrupts and exceptions to S-mode, so that traps will be handled by S-mode handlers instead of M-mode.
Configure PMP (Physical Memory Protection) to give S-mode access to all of physical memory, so that kernel has unrestricted access to all I/O and peripherals.
Enable/Configure timer interrupts for scheduling purposes.
Keep each hart’s id in tp register, so that we can tell which hart we’re running on to index per-CPU data).

start.rs:

#[unsafe(no_mangle)]
pub extern "C" fn start() -> ! {
    let mut x: u64 = r_mstatus();
    x &= !MSTATUS_MPP_MASK;
    x |= MSTATUS_MPP_S;
    w_mstatus(x);

    w_mepc(main as *const () as u64);

    w_satp(0);

    w_medeleg(0xffff);
    w_mideleg(0xffff);
    w_sie(r_sie() | SIE_SEIE | SIE_STIE);

    w_pmpaddr0(0x3fffffffffffff);
    w_pmpcfg0(0xf);

    timerinit();

    let id: u64 = r_mhartid();
    w_tp(id);

    unsafe {
        asm!("mret", options(noreturn));
    };
}

// ask each hart to generate timer interrupts.
fn timerinit() {
    w_mie(r_mie() | MIE_STIE);
    w_menvcfg(r_menvcfg() | (1 << 63));
    w_mcounteren(r_mcounteren() | 2);
    w_stimecmp(r_time() + 1000000);
}

main function

Once mret instruction at the end of start function is executed, we will fall into S-mode and jump to main function because mepc is set to main and MPP is set to S.

main.rs:

#[unsafe(no_mangle)]
pub extern "C" fn main() -> ! {
    loop {
        unsafe { core::arch::asm!("wfi") }
    }
}

For now main just parks each hart in a low-power wfi loop; we don’t handle traps yet, so it just loops. In further posts I will explain how we could add UART to this and print some helpful messages from each core.

Running on QEMU

Here comes the juicy part i.e., actually running our kernel and verifying that our kernel boots and runs main function from all 3 cores.

Follow the below steps:

Checkout my repo at specified commit hash

❯ git clone https://github.com/Karthik-d-k/xv6rs.git

❯ cd xv6rs/

❯ git checkout 99b67fe8db4a7d50033a612a0902c9444caa32c3

In terminal 1, Run QEMU

❯ just qemu-gdb

In terminal 2, Run GDB

❯ just gdb

Verify our kernel in GDB terminal

(gdb) continue
Continuing.
^C               # Press Ctrl+C to interrupt the program and then inspect
Thread 3 received signal SIGINT, Interrupt.
[Switching to Thread 1.3]
0x0000000080000020 in kernel::main () at kernel/src/main.rs:23
23              unsafe { core::arch::asm!("wfi") }
(gdb) info threads
  Id   Target Id                    Frame
  1    Thread 1.1 (CPU#0 [running]) 0x0000000080000020 in kernel::main ()
    at kernel/src/main.rs:23
  2    Thread 1.2 (CPU#1 [running]) 0x0000000080000020 in kernel::main ()
    at kernel/src/main.rs:23
* 3    Thread 1.3 (CPU#2 [running]) 0x0000000080000020 in kernel::main ()
    at kernel/src/main.rs:23
(gdb) thread 1
[Switching to thread 1 (Thread 1.1)]
#0  0x0000000080000020 in kernel::main () at kernel/src/main.rs:23
23              unsafe { core::arch::asm!("wfi") }
(gdb) info reg pc tp
pc             0x80000020       0x80000020 <kernel::main+4>
tp             0x0      0x0
(gdb) thread 2
[Switching to thread 2 (Thread 1.2)]
#0  0x0000000080000020 in kernel::main () at kernel/src/main.rs:23
23              unsafe { core::arch::asm!("wfi") }
(gdb) info reg pc tp
pc             0x80000020       0x80000020 <kernel::main+4>
tp             0x1      0x1
(gdb) thread 3
[Switching to thread 3 (Thread 1.3)]
#0  0x0000000080000020 in kernel::main () at kernel/src/main.rs:23
23              unsafe { core::arch::asm!("wfi") }
(gdb) info reg pc tp
pc             0x80000020       0x80000020 <kernel::main+4>
tp             0x2      0x2

Each core/thread/hart is looping at main and its respective tp registers are set to 0/1/2.

References

/xv6/ /riscv/ /rust/ /os/