Hardware Floating Point

Hardware Floating Point

This is an article in the larger tutorial The Adventures of OS: Making a RISC-V Operating System using Rust.

Up until now, we’ve only been using the integer instructions. However, now that we want to support user processes, we must be able to support the hardware floating-point unit (FPU). In RISC-V, this is fairly simple, but it can lead to some trouble if we’re not careful.

First, the floating-point unit must be controlled through the mstatus register–more specifically the FS bits (bits 14 and 13).

Recall that we used this register to change processor modes (MPP). However, now we care about this register to control the floating point unit. These two bits support four different states of the floating point unit:

FS[1:0]Description
00FPU off.
01FPU on, initial state.
10FPU on, no register changes since FPU was reset.
11FPU on, some registers have changed since FPU was reset.

If we try to load or store (fld or fsd) a floating point register while the FPU is off, it will trap to the operating system. So, we need to make sure we have the floating point unit turned on and ready to go before we do any work with it. We can add a bit in the switch_to_user function we wrote to turn on the FPU into the initial state whenever we context switch to another process.

.global switch_to_user
switch_to_user:
   csrw mscratch, a0
   ld a1, 520(a0)
   ld	a2, 512(a0)
   ld	a3, 552(a0)

   li	t0, 1 << 7 | 1 << 5 | 1 << 13

   slli a3, a3, 11
   or t0, t0, a3
   csrw mstatus, t0
   csrw mepc, a1
   csrw satp, a2
   li t1, 0xaaa
   csrw mie, t1
   la t2, m_trap_vector
   csrw mtvec, t2
   mv t6, a0

   .set i, 0
   .rept 32
   load_fp %i
   .set i, i+1
   .endr

   .set i, 1
   .rept 31
      load_gp %i, t6
      .set i, i+1
   .endr
   mret

The first thing we do is tack on 1 << 13, which sets the FS bits to 01, which is the initial state. Unlike MPP (machine previous privilege), this immediately takes effect. Then, we restore the registers from the TrapFrame structure. Conveniently, this follows the 32 general purpose, integer registers. You can see our macro load_fp gets expanded to the following:

.altmacro
.set NUM_GP_REGS, 32  # Number of registers per context
.set REG_SIZE, 8   # Register size (in bytes)
.macro load_fp i, basereg=t6
	fld	f\i, ((NUM_GP_REGS+(\i))*REG_SIZE)(\basereg)
.endm

We skip \(8\times 32=256\) bytes to get through all 32 of the GP registers. Now we’re at the register we want. Just like ld loads a memory location into a general purpose register, fld will load a memory location into a floating point register. REMEMEBER: the FPU must be turned on (\(\text{FS}\neq 0\)) at this point via the FS bits.

Restore or Save?

We use the FS bits to not only control the FPU, but to also see what happened. The RISC-V privileged specification recommends the following for the FPU during traps:

Handling Traps

The reason the floating-point unit has more than an on/off state is because it allows us to only save the registers if they’ve changed. If the user process never even used the FPU, why save those registers? However, the granularity of the floating point unit is such that if only 1 of the 32 registers changes, we have to save all 32 of them.

So, we need to add code that checks the state of the FPU to see if we need to save the registers. We can do this by reading the FS bit:

   csrr t1, mstatus
   srli t0, t1, 13
   andi t0, t0, 3
   li t3, 3
   bne t0, t3, 1f
   .set i, 0
   .rept 32
      save_fp %i, t5
      .set i, i+1
   .endr
1:

Above, we read the mstatus register, shift it right 13 places and mask it with 3, which is binary 11. This means we isolate the FS bits (2 bits) so we can read what the value is. If these two bits are NOT 11 (recall this means the registers were written to), then we skip saving the floating point registers and jump to the numeric label 1 forward (hence 1f).

To look at the TrapFrame structure, we see the following in Rust (cpu.rs):

#[repr(C)]
#[derive(Clone, Copy)]
pub struct TrapFrame {
   pub regs:   [usize; 32], // 0 - 255
   pub fregs:  [usize; 32], // 256 - 511
   pub satp:   usize,       // 512 - 519
   pub pc:     usize,       // 520
   pub hartid: usize,       // 528
   pub qm:     usize,       // 536
   pub pid:    usize,       // 544
   pub mode:   usize,       // 552
}

Our fregs (floating-point registers) follow the 32 general purpose registers. So, the only math we need to do is \(8\times 32=256\).

This is really it! We can use the floating point unit in the kernel, provided we turn it on before we use any of the registers. Again, if we try to use the FPU while the FS bits are 00, we will get a trap.

Video