Month: February 2021

Writing Pong in Rust for my OS Written in Rust

This post is part of a larger effort you can view over here: https://osblog.stephenmarz.com.

Pong being played on my custom, RISC-V OS in Rust!

Video

Contents

  1. Overview
  2. Application Programmer’s Interface (API)
  3. Starting Routines
  4. System Calls
  5. Drawing Primitives
  6. Event Handling
  7. Start Our Game
  8. Game Loop
  9. PLAY

Overview

We last left off writing a graphics driver and an event driver for our operating system. We also added several system calls to handle drawing primitives as well as handling keyboard and mouse inputs. We are now going to use those to animate the simple game of pong. Just like hello world is the test of every programming language, pong is a test of all of our graphics and event systems.


Application Programmer’s Interface (API)

Since we’re writing in user space, we need to use the system calls we developed to (1) enumerate the framebuffer, (2) invalidate sections of the framebuffer, (3) read keyboard/click events, and (4) read motion events–mouse or tablet.

For Rust, we are still using the riscv64gc-unknown-none-elf target, which doesn’t contain a runtime. Now, riscv64gc-unknown-linux-gnu does exist, however it uses Linux system calls, which we haven’t implemented, yet. So, we are required to create our own system call interface as well as runtime!


Starting Routines

Many programmers know that ELF allows our entry point to start anywhere. However, most linkers will look for a symbol called _start. So, in Rust, we have to create this symbol. Unfortunately, after many attempts, I could not get it to work fully in Rust. So, instead, I used the global_asm! macro to import an assembly file. All it does is call main and invokes the exit system call when main returns.

.section .text.init
.global _start
_start:
.option push
.option norelax
	la	gp, __global_pointer$
.option pop
	li		a0, 0
	li		a1, 0
	call	main
	# Exit system call after main
	li	a7, 93
	ecall
.type _start, function
.size _start, .-_start

I did fail to mention about the global pointer. Since global variables can be stored far away, we use a register to store the top of this location. Luckily, most linkers have a symbol called __global_pointer$ (yes, even with the $) that we put into the gp (global pointer) register. I’m not going to cover relaxation, but this is necessary.

This is all that we need to get into Rust. However, since we’re still in baremetal Rust, we have to define the same symbols we did for our OS, including the panic and abort handlers.

#![no_std]
#![feature(asm, panic_info_message, lang_items, start, global_asm)]

#[lang = "eh_personality"]
extern "C" fn eh_personality() {}

#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
	print!("Aborting: ");
	if let Some(p) = info.location() {
		println!("line {}, file {}: {}", p.line(), p.file(), info.message().unwrap());
	} else {
		println!("no information available.");
	}
	abort();
}
#[no_mangle]
extern "C" fn abort() -> ! {
	loop {
		unsafe {
			asm!("wfi");
		}
	}
}

This allows our code to at least compile, but as you can see we need to import the assembly file as well as define a main.

global_asm!(include_str!("start.S"));
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
    0
}

Now we have an entry point for Rust. Now, we can create the println and print macros to make Rust be more Rusty for us.

#[macro_export]
macro_rules! print
{
	($($args:tt)+) => ({
			use core::fmt::Write;
			let _ = write!(crate::syscall::Writer, $($args)+);
			});
}
#[macro_export]
macro_rules! println
{
	() => ({
		   print!("\r\n")
		   });
	($fmt:expr) => ({
			print!(concat!($fmt, "\r\n"))
			});
	($fmt:expr, $($args:tt)+) => ({
			print!(concat!($fmt, "\r\n"), $($args)+)
			});
}

System Calls

We need to create an API for system calls, since that will be how we get and put certain data to our operating system. Generally, the runtime will establish this for us, but again, we’re in baremetal Rust.

use core::fmt::{Write, Error};
use crate::event::Event;

pub struct Writer;

impl Write for Writer {
	fn write_str(&mut self, out: &str) -> Result<(), Error> {
		for c in out.bytes() {
			putchar(c);
		}
		Ok(())
	}
}

pub fn putchar(c: u8) -> usize {
    syscall(2, c as usize, 0, 0, 0, 0, 0, 0)
}

pub fn sleep(tm: usize) {
    let _ = syscall(10, tm, 0, 0, 0, 0, 0, 0);
}

pub fn get_fb(which_fb: usize) -> usize {
    syscall(1000, which_fb, 0, 0, 0, 0, 0, 0)
}

pub fn inv_rect(d: usize, x: usize, y: usize, w: usize, h: usize) {
    let _ = syscall(1001, d, x, y, w, h, 0, 0);
} 

pub fn get_keys(x: *mut Event, y: usize) -> usize {	
    syscall(1002, x as usize, y, 0, 0, 0, 0, 0)
}

pub fn get_abs(x: *mut Event, y: usize) -> usize {	
    syscall(1004, x as usize, y, 0, 0, 0, 0, 0)
}

pub fn get_time() -> usize {
     syscall(1062, 0, 0, 0, 0, 0, 0, 0)
}

pub fn syscall(sysno: usize, a0: usize, a1: usize, a2: usize, a3: usize, a4: usize, a5: usize, a6: usize) -> usize {
    let ret;
    unsafe {
    asm!("ecall",
        in ("a7") sysno,
        in ("a0") a0,
        in ("a1") a1,
        in ("a2") a2,
        in ("a3") a3,
        in ("a4") a4,
        in ("a5") a5,
        in ("a6") a6,
        lateout("a0") ret);
    }
    ret
}

MAN, I love the new Rust asm! It is very clear what’s happening when you look at it. Usually when I use inline assembly, register selection is extremely important. This makes it rather foolproof–dare I actually say that?

Just like we did for our operating system, we use the Write trait to hook into the format macro that is given to us by Rust.


Drawing Primitives

Ok, with the system calls and the startup code in start.S, we now have everything we need to start programming the game. So, let’s make some drawing primitives. Since this is pong, we only really care about drawing rectangles.

#[repr(C)]
#[derive(Clone,Copy)]
pub struct Pixel {
    pub r: u8,
    pub g: u8,
    pub b: u8,
    pub a: u8,
}

pub type Color = Pixel;

impl Pixel {
    pub fn new(r: u8, g: u8, b: u8) -> Self {
        Self {
            r, g, b, a: 255
        }
    }
}

pub struct Vector {
    pub x: i32,
    pub y: i32
}

impl Vector {
    pub fn new(x: i32, y: i32) -> Self {
        Self {
            x,  y
        }
    }
}

pub struct Rectangle {
    pub x: i32,
    pub y: i32,
    pub width: i32,
    pub height: i32,
}

impl Rectangle {
    pub fn new(x: i32, y: i32, width: i32, height: i32) -> Self {
        Self {
            x, y, width, height
        }
    }
}
pub struct Framebuffer {
    pixels: *mut Pixel
}

impl Framebuffer {
    pub fn new(pixels: *mut Pixel) -> Self {
        Self { pixels }
    }
    pub fn set(&mut self, x: i32, y: i32, pixel: &Pixel) {
        unsafe {
            if x < 640 && y < 480 {
                let v = (y * 640 + x) as isize;
                self.pixels.offset(v).write(*pixel);
            }
        }
    }
    pub fn fill_rect(&mut self, rect: &Rectangle, color: &Pixel) {
        let row_start = rect.y;
        let row_finish = row_start + rect.height;
        let col_start = rect.x;
        let col_finish = col_start + rect.width;
        for row in row_start..row_finish {
            for col in col_start..col_finish {
                self.set(col, row, color);
            }
        }
    }
}

pub fn lerp(value: i32, mx1: i32, mx2: i32) -> i32 {
    let r = (value as f64) / (mx1 as f64);
	return r as i32 * mx2;
}

Now we have a pixel structure, a vector, a rectangle, and a framebuffer. The pixel comes from the operating system, so it is important that we control that structure, which necessitates #[repr(C)].


Event Handling

We can draw, so now we need to be able to handle input. We can create an event structure to handle this.

#[repr(C)]
#[derive(Copy, Clone)]
pub struct Event {
    pub event_type: u16,
	pub code: u16,
	pub value: u32
} 

impl Event {
	pub fn empty() -> Self {
		Self {
			event_type: 0,
			code: 0,
			value: 0
		}
	}
	pub fn new(event_type: u16, code: u16, value: u32) -> Self {
		Self {
			event_type,
			code,
			value
		}
	}
}

// Key codes
pub const KEY_RESERVED: u16 = 0;
pub const KEY_ESC: u16 = 1;
pub const KEY_1: u16 = 2;
pub const KEY_2: u16 = 3;
// ... CLIP ...
pub const KEY_END: u16 = 107;
pub const KEY_DOWN: u16 = 108;

// mouse buttons

pub const BTN_MOUSE: u16 = 0x110;
pub const BTN_LEFT: u16 = 0x110;
pub const BTN_RIGHT: u16 = 0x111;
pub const BTN_MIDDLE: u16 = 0x112;


// mouse movement

pub const ABS_X: u16 = 0x00;
pub const ABS_Y: u16 = 0x01;
pub const ABS_Z: u16 = 0x02;

Many of the constants came from libevdev’s input-event-codes.h. Unfortunately, that’s in C, so a little Python script could make it Rust.

Just like the Pixel structure, the Event structure is defined by our operating system, so we are required to control it ourselves (hence #[repr(C)]).


Start Our Game

const MAX_EVENTS: usize = 25;
const GAME_FRAME_TIMER: usize = 1000;

#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
	use drawing::Framebuffer;
	use drawing::Pixel;

	let ufb = syscall::get_fb(6) as *mut Pixel;
	let mut fb = Framebuffer::new(ufb);
	let background_color = drawing::Pixel::new(25, 36, 100);
	let mut event_list = [event::Event::empty(); MAX_EVENTS];
	let event_list_ptr = event_list.as_mut_ptr();

	let player_color = drawing::Pixel::new(255, 0, 0);
	let npc_color = drawing::Pixel::new(0, 255, 0);
	let ball_color = drawing::Pixel::new(255, 255, 255);

	let mut game = pong::Pong::new(player_color, npc_color, ball_color, background_color);
	// GAME LOOP HERE
	println!("Goodbye :)");
	0
}

You can see the first thing we do is grab a framebuffer. If you recall from the operating system tutorial, our operating system will map the pixels into our application’s memory space. We can update the pixels as we see fit, but to actually realize it on the screen, we must invalidate the given pixels, which is a separate system call in our OS.

You can see we have a structure called Pong. This structure contains the routines we need to make this somewhat a game. We have a timing function called advance_frame, and we have a way to get the game onto the screen using the draw function.

This is pretty gross, but here’s the Pong structure’s advance frame and draw routines (the rest can be found on GitHub):

impl Pong {
	pub fn advance_frame(&mut self) {
		if !self.paused {
            self.move_ball(self.ball_direction.x, self.ball_direction.y);
			let miss = 
			if self.ball.location.x < 40 {
				// This means we're in the player's paddle location. Let's
				// see if this is a hit or a miss!
				let paddle = (self.player.location.y, self.player.location.y + self.player.location.height);
				let ball = (self.ball.location.y, self.ball.location.y + self.ball.location.height);

				if paddle.0 <= ball.0 && paddle.1 >= ball.0 {
					false
				}
				else if paddle.0 <= ball.1 && paddle.1 >= ball.1 {
					false
				}
				else {
					true
				}
			}
			else {
				false
			};
			if miss {
				self.reset();
				self.paused = true;
			}
			else {
				if self.ball.location.x < 40 || self.ball.location.x > 580 {
					self.ball_direction.x = -self.ball_direction.x;
				}
				if self.ball.location.y < 20 || self.ball.location.y > 430 {
					self.ball_direction.y = -self.ball_direction.y;
				}
				let new_loc = self.ball.location.y - self.npc.location.height / 2;
				self.npc.location.y = if new_loc > 0 { new_loc } else { 0 };
			}
		}
	}
	pub fn draw(&self, fb: &mut Framebuffer) {
		fb.fill_rect(&Rectangle::new(0, 0, 640, 480), &self.bgcolor);
		fb.fill_rect(&self.player.location, &self.player.color);
		fb.fill_rect(&self.npc.location, &self.npc.color);
		fb.fill_rect(&self.ball.location, &self.ball.color);
	}
}

So our game will move the ball given the direction vector every frame due to advance_frame being called from our game loop. We also have the ability to pause the game (or unpause). Finally, we have very basic collision detection for the player’s paddle.


Game Loop

Now we need to put all of this together using a game loop, which will handle input events as well as advance the animation.

  // GAME LOOP
  gameloop: loop {
		// handle mouse buttons and keyboard inputs
		// println!("Try get keys");
		let num_events = syscall::get_keys(event_list_ptr, MAX_EVENTS);
		for e in 0..num_events {
			let ref ev = event_list[e];
			// println!("Key {}  Value {}", ev.code, ev.value);
			// Value = 1 if key is PRESSED or 0 if RELEASED
			match ev.code {
				event::KEY_Q => break 'gameloop,
				event::KEY_R => game.reset(),
				event::KEY_W | event::KEY_UP => game.move_player(-20),
				event::KEY_S | event::KEY_DOWN => game.move_player(20),
				event::KEY_SPACE => if ev.value == 1 { 
					game.toggle_pause();
					if game.is_paused() {
						println!("GAME PAUSED");
					}
					else {
						println!("GAME UNPAUSED")
					}
				},
				_ => {}
			}
		}
		game.advance_frame();
		game.draw(&mut fb);
		syscall::inv_rect(6, 0, 0, 640, 480);
		syscall::sleep(GAME_FRAME_TIMER);
	}

PLAY

Our event loop uses W to move the paddle up or D to move the paddle down. Space toggles the pause and R resets the game.

Hmmm. When I started writing this, it seemed more impressive in my mind. Oh well, have fun!

Why did I catch s*** for my use of the RISC-V SFENCE.VMA instruction?

This post is part of a longer OS tutorial which can be found here: https://osblog.stephenmarz.com

Contents

  1. Introduction
  2. What is SATP?
  3. What is SFENCE.VMA?
  4. What is happening?
  5. The Translation Lookaside Buffer
  6. Conclusion
  7. References

Introduction

My last post garnered some attention by those telling me that I “forgot” to execute an SFENCE.VMA after I wrote to the SATP register–some with more tact than others. This post is here to clarify why I did what I did, and to clarify what the specification actually tells us needs to be done.


What is SATP?

The supervisor address translation and protection (SATP) register is a register that tells the MMU what mode it’s in, what address space it is working in, and where to find the first level page table in RAM (this would be level 2 for Sv39).

The SATP Register Fields

The SATP register stores three pieces of information, the MODE, the address space identifier (ASID), and the physical page number (PPN).

The MODE

If the MODE=0, then the MMU is turned off and any address is not translated.

If MODE=8, we’re in Sv39 (Supervisor 39-bit) mode which means that our virtual addresses are 39-bits long.

There are other modes, but I won’t cover them here.

The ASID

The address space identifier tags translations with a unique identifier.

The PPN

The physical page number is the upper 44-bits of a 56-bit memory address where we can find the first level page table.


What is SFENCE.VMA?

In absolute terms, this means supervisor fence.virtual memory address. In real terms, it will flush the cache’s so that the MMU will “see” the new changes in memory. This means that the MMU will be forced to look in RAM where the page tables are stored.

The SFENCE.VMA instruction has several different ways to execute it as shown in the specification. This should clue us in to the fact that executing SFENCE.VMA every time we write to SATP might not be so cut and dry.


What is Happening?

So, why is this not straightforward? The issue is that “walking” a page table–meaning going into RAM and translating a virtual address into a physical address–is not a fast process. There are multiple levels of page tables, and several 8-byte entries that need to be dereferenced by the memory controller.

We can speed these walks up by “caching” frequently translated addresses into a table. The table has the virtual address number and the physical address number, so translation is just a table lookup instead of dereferencing several levels of page tables.

This caching can be speculative. If the MMU doesn’t speculate, then the first time we translate an address, that address will not be in the TLB (the cache table) and we will get what is known as a compulsory miss. If we speculate, we can predict the addresses that will be used and when the memory controller isn’t doing anything else, we can load the virtual address and physical address into cache.

This speculation is one of the reasons for SFENCE.VMA. Another reason is due to the fact that when we translate a page using the MMU, it stores the most recent translations in the TLB as well.


The Translation Lookaside Buffer

The translation lookaside buffer or TLB is a fancy term for the MMU cache. It stores the most recent translations to exploit temporal locality–that is, the chances we’re going to translate the same address near in the future is likely. So, instead of having to walk the page tables all over again, we just look in the TLB.

The TLB has several entries, and with RISC-V, it stores an address space identifier (ASID). The address space identifier allows the TLB to store more entries than just the most recent page table. This has always been a problem with TLBs, including with the Intel/AMD processor. Writing to its MMU register (called CR3 for control register #3) will cause a TLB flush. This is NOT the case with RISC-V writing to the SATP register (the MMU register in RISC-V).

The specification just gives the general rules for a manufacturer to use. Therefore, the manufacturer can choose how they want to implement their MMU and TLB as long as it complies with RISC-V’s privileged specification. Here’s a simple implementation of a TLB that complies with the privileged specification.


Conclusion

The RISC-V specification doesn’t make it very clear, but you can see clarification on the spec’s github repository. If the MODE is not 0, then the MMU is allowed to speculate, meaning it can pre-populate the MMU based on the addresses it thinks will need to be translated in the near future. The specification allows this, but the MMU cannot throw a page fault if a speculatory translation is invalid.

So, bottom line — SFENCE.VMA should NOT be called every time SATP is changed. This will cause TLB thrashing since every time you context switch, you will need to change the SATP register to the kernel page table, schedule, then change the SATP register to the new scheduled process’ page table.

Instead, the SFENCE.VMA instruction should be invoked when one or more of the following occur:

  1. When software recycles an ASID (i.e., reassociates it with a different page table), it should first change satp to point to the new page table using the recycled ASID, then execute SFENCE.VMA with rs1=x0 and rs2 set to the recycled ASID. Alternatively, software can execute the same SFENCE.VMA instruction while a different ASID is loaded into satp, provided the next time satp is loaded with the recycled ASID, it is simultaneously loaded with the new page table.
  2. If the implementation does not provide ASIDs, or software chooses to always use ASID 0, then after every satp write, software should execute SFENCE.VMA with rs1=x0. In the common case that no global translations have been modified, rs2 should be set to a register other than x0 but which contains the value zero, so that global translations are not flushed.
  3. If software modifies a non-leaf PTE, it should execute SFENCE.VMA with rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to the ASID for which the translation is being modified.
  4. If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1 set to a virtual address within the page. If any PTE along the traversal path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to the ASID for which the translation is being modified.
  5. For the special cases of increasing the permissions on a leaf PTE and changing an invalid PTE to a valid leaf, software may choose to execute the SFENCE.VMA lazily. After modifying the PTE but before executing SFENCE.VMA, either the new or old permissions will be used. In the latter case, a page fault exception might occur, at which point software should execute SFENCE.VMA in accordance with the previous bullet point.

Unfortunately, you have to dig through the issues and updates to the specification on GitHub to find out some of this information. I have provided links in references below.


References