Getting Graphical Output from our Custom RISC-V Operating System in Rust

Getting Graphical Output from our Custom RISC-V Operating System in Rust

An operating system is used to make our job easier when using graphics. In our instance, in addition to everything else. In this post, we will be writing a GPU (graphics processing unit) driver using the VirtIO specification. In here, we will allow user applications to have a portion of the screen as RAM–with what is commonly known as a framebuffer.


Contents

  1. Overview
  2. Pixels and Resolution
  3. The GPU VirtIO Device
  4. Initialization
  5. Invalidation and Transfer
  6. Device Responses
  7. User Space
  8. Simple Graphics API
  9. Conclusions and Further Reading

Overview

We command the virtual GPU (virtio-gpu) by sending certain commands to the host (the device). The guest (the OS driver) has an allocation of RAM that becomes the framebuffer. The driver then tells the device, “hey, here’s the RAM that we’re going to use to store pixel information.”

The RAM is contiguous in our OS, but according to the specification, this isn’t strictly required. We will give the driver a rectangle. Everything that falls within that rectangle will be copied to the host. We don’t want to keep copying the entire buffer over and over again.

We will be using the virtio protocol that we used for the block driver here, so I won’t rehash the general virtio protocol. However, the device-specific structures are a bit different, so we’ll cover that part more in depth.


Pixels and Resolution

A framebuffer must be large enough to store \(\text{width}\times\text{height}\times\text{pixel size}\) number of bytes. There are \(\text{width}\times\text{height}\) number of pixels. Each pixel has a 1-byte red, green, blue, and alpha channels. So, each pixel is exactly 4 bytes with the configuration we’re going to specify.

The framebuffer for our junior GPU driver is going to support a fixed resolution of \(640\times 480\). If you’re a child of the 90s, you saw this resolution a lot. In fact, my first computer, a Laser Pal 386, had a 16-color monitor with a resolution of 640 pixels wide with 480 pixels tall.

There are red, green, and blue pixels so close together that by varying the intensity of these three channels, we can change the color. The closer we get to our monitors, the easier a pixel is to see.

Pixels on a Viewsonic VX2770SMH-LED monitor.

You can see these little squares. If you squint enough, you can see that they aren’t pure white. Instead, you can see bits of red, blue, and green. That’s because each one of these little squares is subdivided into three colors: yep, red, green, and blue! To make white, these pixels are turned up to 11 (get the joke?). To make black, we turn off all three channels of that pixel.

The resolution refers to how many of these squares are on our monitor. This is a 1920×1080 monitor. That means that there are 1920 of these squares going left to right, and there are 1080 of these squares from top to bottom. All in all, we have \(1920\times 1080=2,073,600\) number of pixels. Each one of these pixels is expressed using 4 bytes in the framebuffer, meaning we need \(2,073,600\times 4=8,294,400\) bytes in RAM to store the pixel information.

You can see why I limited our resolution to 640×480, which only requires \(640\times 480\times 4=1,228,800\) bytes–a bit over a megabyte.


The GPU VirtIO Device

The GPU device requires us to read a more up-to-date VirtIO specification. I’ll be reading from version 1.1, which you can get a copy here: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html. Specifically, chapter 5.7 “GPU Device”. This is an unaccelerated 2D device, meaning that we must use the CPU to actually form the framebuffer, then we transfer our CPU formulated memory location to the host GPU, which is then responsible for drawing it to the screen.

The device uses a request/response system, where we the driver make a command to request something from the host (the GPU). We add a bit of extra memory into our request so that the host can formulate its response. When the GPU interrupts us, we can take a look at this response memory location to see what the GPU told us. This is much like the status field on the block driver, where the block device tells us the status of our last request.

Each request starts with a Command Header, which in Rust looks as follows:

#[repr(C)]
struct CtrlHeader {
	ctrl_type: CtrlType,
	flags: u32,
	fence_id: u64,
	ctx_id: u32,
	padding: u32
}

The header is common for all requests and all responses. We can differentiate by the CtrlType enumeration, which is:

#[repr(u32)]
enum CtrlType {
	/* 2d commands */
	CmdGetDisplayInfo = 0x0100,
	CmdResourceCreate2d,
	CmdResourceUref,
	CmdSetScanout,
	CmdResourceFlush,
	CmdTransferToHost2d,
	CmdResourceAttachBacking,
	CmdResourceDetachBacking,
	CmdGetCapsetInfo,
	CmdGetCapset,
	CmdGetEdid,
	/* cursor commands */
	CmdUpdateCursor = 0x0300,
	CmdMoveCursor,
	/* success responses */
	RespOkNoData = 0x1100,
	RespOkDisplayInfo,
	RespOkCapsetInfo,
	RespOkCapset,
	RespOkEdid,
	/* error responses */
	RespErrUnspec = 0x1200,
	RespErrOutOfMemory,
	RespErrInvalidScanoutId,
	RespErrInvalidResourceId,
	RespErrInvalidContextId,
	RespErrInvalidParameter,
}

I took this directly from the specification, but Rust-ified the names to avoid getting yelled at by the linter.

Pixel Formats

Recall that the framebuffer is just a bunch of bytes in memory. We need to put a structure behind the framebuffer so the host (the GPU) knows how to interpret your sequence of bytes. There are several formats, but all-in-all, they just re-arrange the red, green, blue, and alpha channels. All are exactly 4 bytes, which makes the stride the same. The stride is the spacing from one pixel to another–4 bytes.

#[repr(u32)]
enum Formats {
	B8G8R8A8Unorm = 1,
	B8G8R8X8Unorm = 2,
	A8R8G8B8Unorm = 3,
	X8R8G8B8Unorm = 4,
	R8G8B8A8Unorm = 67,
	X8B8G8R8Unorm = 68,
	A8B8G8R8Unorm = 121,
	R8G8B8X8Unorm = 134,
}

The type, unorm, is an 8-bit (1-byte) unsigned value from 0 through 255, where 0 represents no intensity and 255 represents full intensity, and a number in between is a linear-interpolation between no and full intensity. Since there are three color (and one alpha), that gives us \(256\times 256\times 256=16,776,216\) different colors or levels of colors.

For this tutorial, I selected R8G8B8A8Unorm = 67, which has red first, green second, blue third, and alpha fourth. This is a common ordering, so I’ll select it to make it easy to follow along.

Our selected format makes the pixel structure look as follows:

Recall that each individual component R, G, B, and A are each one byte a piece, so each Pixel referred to by (x, y) is 4 bytes. This is why our memory pointer is a Pixel structure instead of a byte.


Initialization

Just like all other virtio devices, we set up the virtqueues first and then we work on device-specific initialization. In my code, I just directly copied-and-pasted from the block driver into the gpu driver. The only thing I added to the Device structure was the framebuffer and dimensions of the framebuffer.

pub struct Device {
	queue:        *mut Queue,
	dev:          *mut u32,
	idx:          u16,
	ack_used_idx: u16,
	framebuffer:  *mut Pixel,
	width:        u32,
	height:       u32,
}

The specification tells us to do the following in order to initialize the device and get things ready to draw. I Rust-ified some of the content to match our enumerations.

Create a framebuffer and configure scanout

  1. Create a host resource using CmdResourceCreate2d.
  2. Allocate a framebuffer from guest ram, and attach it as backing storage to the resource just created, using CmdResourceAttachBacking.
  3. Use CmdSetScanout to link the framebuffer to a display scanout.

A Request Structure

Recall that our request and response come packaged together. We will put them in separate descriptors, but whenever we get a response back from the device, it is going to be easier if we free just once to free both the request and response. So, in Rust, I created the Request structure to support doing this.

struct Request<RqT, RpT> {
	request: RqT,
	response: RpT,
}
impl<RqT, RpT> Request<RqT, RpT> {
	pub fn new(request: RqT) -> *mut Self {
		let sz = size_of::<RqT>() + size_of::<RpT>();
		let ptr = kmalloc(sz) as *mut Self;
		unsafe {
			(*ptr).request = request;
		}
		ptr
	}
}

Step 1: Create host resource

let rq = Request::new(ResourceCreate2d {
	hdr: CtrlHeader {
		ctrl_type: CtrlType::CmdResourceCreate2d,
		flags: 0,
		fence_id: 0,
		ctx_id: 0,
		padding: 0,
	},
	resource_id: 1,
	format: Formats::R8G8B8A8Unorm,
	width: dev.width,
	height: dev.height,
});
let desc_c2d = Descriptor {
	addr: unsafe { &(*rq).request as *const ResourceCreate2d as u64 },
	len: size_of::<ResourceCreate2d>() as u32,
	flags: VIRTIO_DESC_F_NEXT,
	next: (dev.idx + 1) % VIRTIO_RING_SIZE as u16,
};
let desc_c2d_resp = Descriptor {
	addr: unsafe { &(*rq).response as *const CtrlHeader as u64 },
	len: size_of::<CtrlHeader>() as u32,
	flags: VIRTIO_DESC_F_WRITE,
	next: 0,
};
unsafe {
	let head = dev.idx;
	(*dev.queue).desc[dev.idx as usize] = desc_c2d;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).desc[dev.idx as usize] = desc_c2d_resp;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).avail.ring[(*dev.queue).avail.idx as usize % VIRTIO_RING_SIZE] = head;
	(*dev.queue).avail.idx = (*dev.queue).avail.idx.wrapping_add(1);
}

All we’re really telling the GPU here is our resolution and the format of the framebuffer. When we create this, the host gets to configure itself, such as allocating an identical buffer to make transfers from our OS.

Step 2: Attach framebuffer backing.

let rq = Request3::new(AttachBacking {
	hdr: CtrlHeader {
		ctrl_type: CtrlType::CmdResourceAttachBacking,
		flags: 0,
		fence_id: 0,
		ctx_id: 0,
		padding: 0,
	},
	resource_id: 1,
	nr_entries: 1,
},
MemEntry {
	addr: dev.framebuffer as u64,
	length: dev.width * dev.height * size_of::<Pixel>() as u32,
	padding: 0, 
}
);
let desc_ab = Descriptor {
	addr: unsafe { &(*rq).request as *const AttachBacking as u64 },
	len: size_of::<AttachBacking>() as u32,
	flags: VIRTIO_DESC_F_NEXT,
	next: (dev.idx + 1) % VIRTIO_RING_SIZE as u16,
};
let desc_ab_mementry = Descriptor {
	addr: unsafe { &(*rq).mementries as *const MemEntry as u64 },
	len: size_of::<MemEntry>() as u32,
	flags: VIRTIO_DESC_F_NEXT,
	next: (dev.idx + 2) % VIRTIO_RING_SIZE as u16,
};
let desc_ab_resp = Descriptor {
	addr: unsafe { &(*rq).response as *const CtrlHeader as u64 },
	len: size_of::<CtrlHeader>() as u32,
	flags: VIRTIO_DESC_F_WRITE,
	next: 0,
};
unsafe {
	let head = dev.idx;
	(*dev.queue).desc[dev.idx as usize] = desc_ab;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).desc[dev.idx as usize] = desc_ab_mementry;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).desc[dev.idx as usize] = desc_ab_resp;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).avail.ring[(*dev.queue).avail.idx as usize % VIRTIO_RING_SIZE] = head;
	(*dev.queue).avail.idx = (*dev.queue).avail.idx.wrapping_add(1);
}

The backing is exposed to the GPU through the MemEntry structure. This essentially is a physical address in guest RAM. The MemEntry, aside from padding, is just a pointer and a length.

Notice that I created a new structure called Request3. This is because this step requires three separate descriptors: (1) the header, (2) the mementry, (3) the response, whereas usually we only need two descriptors. Our structure is much like a normal Request, but it includes the mementries.

struct Request3<RqT, RmT, RpT> {
   request: RqT,
   mementries: RmT,
   response: RpT,
}
impl<RqT, RmT, RpT> Request3<RqT, RmT, RpT> {
   pub fn new(request: RqT, meminfo: RmT) -> *mut Self {
      let sz = size_of::<RqT>() + size_of::<RmT>() + size_of::<RpT>();
      let ptr = kmalloc(sz) as *mut Self;
      unsafe {
         (*ptr).request = request;
         (*ptr).mementries = meminfo;
      }
      ptr
   }
}

Step 3: Set Scanout

let rq = Request::new(SetScanout {
	hdr: CtrlHeader {
		ctrl_type: CtrlType::CmdSetScanout,
		flags: 0,
		fence_id: 0,
		ctx_id: 0,
		padding: 0,
	},
	r: Rect::new(0, 0, dev.width, dev.height),
	resource_id: 1,
	scanout_id: 0,
});
let desc_sso = Descriptor {
	addr: unsafe { &(*rq).request as *const SetScanout as u64 },
	len: size_of::<SetScanout>() as u32,
	flags: VIRTIO_DESC_F_NEXT,
	next: (dev.idx + 1) % VIRTIO_RING_SIZE as u16,
};
let desc_sso_resp = Descriptor {
	addr: unsafe { &(*rq).response as *const CtrlHeader as u64 },
	len: size_of::<CtrlHeader>() as u32,
	flags: VIRTIO_DESC_F_WRITE,
	next: 0,
};
unsafe {
	let head = dev.idx;
	(*dev.queue).desc[dev.idx as usize] = desc_sso;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).desc[dev.idx as usize] = desc_sso_resp;
	dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
	(*dev.queue).avail.ring[(*dev.queue).avail.idx as usize % VIRTIO_RING_SIZE] = head;
	(*dev.queue).avail.idx = (*dev.queue).avail.idx.wrapping_add(1);
}

When we want to write to a buffer, we will refer to it by its scanout number. If we had two scanouts, we could draw on one while the other is displayed to the screen. This is called double-buffering, but for our purposes, we don’t do this. Instead, we draw on the same framebuffer, then transfer certain portions for the GPU to update the display.

After we signal QueueNotify, the virtio register “GO” button, then the GPU will create a new buffer internally, set the backing store, and set the scanout number to this buffer. We now have an initialized framebuffer!


Invalidation and Transfer

We now have memory that contains pixels. However, we have our own memory, and the GPU has its own memory. So, to get ours to the GPU, it needs to be transferred. We set the backing store during initialization, so we now only have to refer to what we want updated by its scanout number.

Invalidation is important, since updating the entire screen every time we make a change is very expensive. In fact, if we transfer our entire screen, we need to transfer \(640\times 480\times 4=1,228,800\) bytes. For framerates, such as 20 or 30 frames per second, we need to transfer this number of bytes 20 or 30 times a second!

Instead of transferring everything, we invalidate certain portions of the framebuffer, and the GPU will only copy over those Pixels that fall within the invalidated region, whose coordinates are defined by a Rect structure.

#[repr(C)]
#[derive(Clone, Copy)]
pub struct Rect {
	pub x: u32,
	pub y: u32,
	pub width: u32,
	pub height: u32,
}
impl Rect {
	pub const fn new(x: u32, y: u32, width: u32, height: u32) -> Self {
		Self {
			x, y, width, height
		}
	}
}

Notice that this Rect is defined by an upper-left coordinate (x, y) and then a width and height. Rectangles can be defined by their coordinates (x1, y1), (x2, y2) or an initial coordinate and width and height. I don’t see anything in the spec about the former, but when I try to invalidate and transfer, it appears that it’s treating the rectangle as the latter. Oh well, more testing I guess…

Invalidating

Invalidating is just transferring the data from the guest (driver) to the host (GPU). This just copies the memory, to update the framebuffer, we execute a flush command.

pub fn transfer(gdev: usize, x: u32, y: u32, width: u32, height: u32) {
   if let Some(mut dev) = unsafe { GPU_DEVICES[gdev-1].take() } {
      let rq = Request::new(TransferToHost2d {
         hdr: CtrlHeader {
		ctrl_type: CtrlType::CmdTransferToHost2d,
		flags: 0,
		fence_id: 0,
		ctx_id: 0,
		padding: 0,
         },
	r: Rect::new(x, y, width, height),
	offset: 0,
	resource_id: 1,
	padding: 0,
	});
	let desc_t2h = Descriptor {
		addr: unsafe { &(*rq).request as *const TransferToHost2d as u64 },
		len: size_of::<TransferToHost2d>() as u32,
		flags: VIRTIO_DESC_F_NEXT,
		next: (dev.idx + 1) % VIRTIO_RING_SIZE as u16,
	};
	let desc_t2h_resp = Descriptor {
		addr: unsafe { &(*rq).response as *const CtrlHeader as u64 },
		len: size_of::<CtrlHeader>() as u32,
		flags: VIRTIO_DESC_F_WRITE,
		next: 0,
	};
	unsafe {
		let head = dev.idx;
		(*dev.queue).desc[dev.idx as usize] = desc_t2h;
		dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
		(*dev.queue).desc[dev.idx as usize] = desc_t2h_resp;
		dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
		(*dev.queue).avail.ring[(*dev.queue).avail.idx as usize % VIRTIO_RING_SIZE] = head;
		(*dev.queue).avail.idx = (*dev.queue).avail.idx.wrapping_add(1);
	}
	// Step 5: Flush
	let rq = Request::new(ResourceFlush {
		hdr: CtrlHeader {
			ctrl_type: CtrlType::CmdResourceFlush,
			flags: 0,
			fence_id: 0,
			ctx_id: 0,
			padding: 0,
		},
		r: Rect::new(x, y, width, height),
		resource_id: 1,
		padding: 0,
	});
	let desc_rf = Descriptor {
		addr: unsafe { &(*rq).request as *const ResourceFlush as u64 },
		len: size_of::<ResourceFlush>() as u32,
		flags: VIRTIO_DESC_F_NEXT,
		next: (dev.idx + 1) % VIRTIO_RING_SIZE as u16,
	};
	let desc_rf_resp = Descriptor {
		addr: unsafe { &(*rq).response as *const CtrlHeader as u64 },
		len: size_of::<CtrlHeader>() as u32,
		flags: VIRTIO_DESC_F_WRITE,
		next: 0,
	};
	unsafe {
		let head = dev.idx;
		(*dev.queue).desc[dev.idx as usize] = desc_rf;
		dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
		(*dev.queue).desc[dev.idx as usize] = desc_rf_resp;
		dev.idx = (dev.idx + 1) % VIRTIO_RING_SIZE as u16;
		(*dev.queue).avail.ring[(*dev.queue).avail.idx as usize % VIRTIO_RING_SIZE] = head;
		(*dev.queue).avail.idx = (*dev.queue).avail.idx.wrapping_add(1);
	}
	// Run Queue
	unsafe {
		dev.dev
		.add(MmioOffsets::QueueNotify.scale32())
		.write_volatile(0);
		GPU_DEVICES[gdev-1].replace(dev);
	}
}

So, our transfer first tells the host that we’ve updated a certain portion of the framebuffer, which is specified as x, y, width, and height. Then we do what is called a resource flush to get the GPU to commit all transfers to the screen.


Device Responses

This is a fairly easy section. Most of the device responses come in the form of NODATA, which is just an acknowledgment that it made the request. Also, notice that unlike the block driver, we don’t have watchers here. This allows us to asynchronously update the screen.


User space

The whole point of this is to get a user space application drawing stuff to the screen. Generally, we wouldn’t give the full framebuffer to any user space application that wants it, but for our purposes, we can live with it for now. Instead, we would have a window manager delegate certain rectangles of the framebuffer to different applications. The window manager would also be responsible for handling events and sending the appropriate events to the GUI application.

System Calls

To allow our userspace applications to use the GPU, we need two system calls. One to get a pointer to the framebuffer. Recall that we first must map the framebuffer to the userspace’s MMU table. This is why we allocated pages instead of using kmalloc.

let dev = (*frame).regs[Registers::A0 as usize];
(*frame).regs[Registers::A0 as usize] = 0;
if dev > 0 && dev <= 8 {
	if let Some(p) = gpu::GPU_DEVICES[dev - 1].take() {
		let ptr = p.get_framebuffer() as usize;
		gpu::GPU_DEVICES[dev-1].replace(p);
		if (*frame).satp >> 60 != 0 {
			let p = get_by_pid((*frame).pid as u16);
			let table = ((*p).get_table_address()
						 as *mut Table)
						.as_mut()
						.unwrap();
            let num_pages = (p.get_width() * p.get_height() * 4) as usize / PAGE_SIZE;
			for i in 0..num_pages {
				let vaddr = 0x3000_0000 + (i << 12);
				let paddr = ptr + (i << 12);
				map(table, vaddr, paddr, EntryBits::UserReadWrite as i64, 0);
			}
		}
		(*frame).regs[Registers::A0 as usize] = 0x3000_0000;
	}
}

As you can see above, we grab the framebuffer from the GPU device and map it to 0x3000_0000. Currently, I calculate the number of pages for the framebuffer, which is \(\frac{640\times 480\times 4}{4,096}=300\). So, we need exactly 300 pages for this resolution.

So, now we have a framebuffer, so the userspace application can write what it wants into this memory location. However, a write doesn’t immediately update the screen. Recall that we must transfer and then flush to get the results written to the screen. This is where our second system call comes into play.

let dev = (*frame).regs[Registers::A0 as usize];
let x = (*frame).regs[Registers::A1 as usize] as u32;
let y = (*frame).regs[Registers::A2 as usize] as u32;
let width = (*frame).regs[Registers::A3 as usize] as u32;
let height = (*frame).regs[Registers::A4 as usize] as u32;

gpu::transfer(dev, x, y, width, height);

I showed the transfer function above, which just makes two requests: (1) CmdTransferToHost2d and (2) CmdResourceFlush. When the userspace application makes this system call, the results will be flushed to the screen and hence, it’ll be visible to the user. I don’t error check in the system call itself. The transfer function will error check the device, and the device will error check the x, y, width, and height. So, if this is incorrect, the transfer function will silently fail, and nothing will update to the screen.


Simple Graphics API

To see something displayed to the screen, we need to be able to draw the simplest things, rectangles. If we have a width of the rectangle small enough, we can draw straight lines–horizontally or vertically!

Drawing Rectangles

We are given a contiguous piece of memory in row-major format. That means that we exhaust each column in a row before we move to the next row. So, framebuffer[0] and framebuffer[1] are columns 0 and 1 of row 0. The calculation is fairly straight forward to get to the next row, we must go one past the last column. So, the formula becomes:

$$\text{byte}=\text{row}\times \text{width}+\text{column}$$

struct Pixel {
	unsigned char r;
	unsigned char g;
	unsigned char b;
	unsigned char a;
};
void set_pixel(Pixel *fb, u32 x, u32 y, Pixel &color) {
   // x is column, y is row
   if (x < 640 && y < 480) {
      fb[y * 640 + x] = color;
   }
}

So, the function above writes to a single Pixel. This structure is a 4-byte structure containing red, green, blue, and alpha bytes. However, we want two different types of rectangle drawing: fill and stroke. Fill will fill the area of the rectangle with the given Pixel structure (color) whereas stroke is just the outline of a rectangle.

void fill_rect(Pixel *fb, u32 x, u32 y, u32 width, u32 height, Pixel &color) {
   for (u32 row = y; row < (y+height);row++) {
      for (u32 col = x; col < (x+width);col++) {
         set_pixel(fb, col, row, color);
      }
   }
}
void stroke_rect(Pixel *fb, u32 x, u32 y, u32 width, u32 height, Pixel &color, u32 size) {
   // Essentially fill the four sides.
   // Top
   fill_rect(fb, x, y, width, size, color);
   // Bottom
   fill_rect(fb, x, y + height, width, size, color);
   // Left
   fill_rect(fb, x, y, size, height, color);
   // Right
   fill_rect(fb, x + width, y, size, height + size, color);
}

Trigonometry

Of course, when I tried to brag about drawing rectangles to a friend of mine, he mentions the following.

Oh no…I don’t have cos/sin/tan or anything like that in my OS. I couldn’t say no, and I couldn’t be beaten by a simple cosine, right? Challenge accepted.

I ended up writing a cosine function based on an infinite series, but he took it several steps further and wrote several ways and benchmarked them to see which was better in terms of memory footprint, accuracy, and speed (see link below in Conclusions and Further Reading). Here’s mine:

f64 cos(f64 angle_degrees) {
	f64 x = 3.14159265359 * angle_degrees / 180.0;
	f64 result = 1.0;
	f64 inter = 1.0;
	f64 num = x * x;
	for (int i = 1;i <= 6;i++) {
		u64 comp = 2 * i;
		u64 den = comp * (comp - 1);
		inter *= num / den;
		if ((i & 1) == 0) {
			result += inter;
		}
		else {
			result -= inter;
		}
	}
	return result;
}

This is an infinite series, but we can get more accuracy with more terms. For a compromise, the for loop’s termination, i <= 6, is the number of terms, so 6 terms gives us alright accuracy for graphics, at least from what I can visually tell on a \(640\times 480\) screen.


Testing

Now, the fun part. Let’s see if this works! Here’s our userspace code.

int main() {   
   Pixel *fb = (Pixel *)syscall_get_fb(6);
   Pixel blue_color = {0, 0, 255, 255};
   Pixel red_color = {255, 0, 0, 255};
   Pixel green_color = {0, 255, 0, 255};
   Pixel white_color = {255, 255, 255, 255};
   Pixel orange_color = {255, 150, 0, 255};

   fill_rect(fb, 0, 0, 640, 480, white_color);
   stroke_rect(fb, 10, 10, 20, 20, blue_color, 5);
   stroke_rect(fb, 50, 50, 40, 40, green_color, 10);
   stroke_rect(fb, 150, 150, 140, 140, red_color, 15);
   fill_rect(fb, 10, 300, 500, 100, orange_color);
   syscall_inv_rect(6, 0, 0, 640, 480);
   return 0;
}

And here’s the result!

Let’s add in our cosine function and see what happens!

void draw_cosine(Pixel *fb, u32 x, u32 y, u32 width, u32 height, Pixel &color) {
   for (u32 i = 0; i < width;i++) {
      f64 fy = -cos(i % 360);
      f64 yy = fy / 2.0 * height;
      u32 nx = x + i;
      u32 ny = yy + y;
      fill_rect(fb, nx, ny, 2, 2, color);
      }
}

That’s looking good.


Conclusion

Our operating system is starting to look more and more like a normal operating system. We still need an input system so that we can interact with our operating system, but that’ll be the next thing we tackle.

Sometime in the future, we will compile newlib so that we have a standard library in userspace. Right now, we’re forced to write our own functions.

For a great read regarding cosine and the challenges with it, head on over to Dr. Austin Henley’s blog on cosine.