Fourth - Time, Debug Text, Uniforms
(2205 words)It has been a bit of a week. I’ve only added about 6651 lines of code this week. Most of these lines are not code, but I’ll explain later.
First, we have to talk about uniforms. In the graphics world, the concept of a uniform is a very small bit of data that is used by a vertex or fragment shader to do a lot of work. For example, if we wanted to have a triangle rotating around the screen we could do all the math on the CPU and update the buffers that describe the triangle every frame. This is both a lot of math for a CPU to do, and also a lot of memory that has to be copied between the CPU memory subsystem and the GPU memory subsystem. These are different blocks of memory and are only connected by a slightly wet piece of string, typically called the PCIe bus by most people.
Nope, the best way to do this is have the GPU do all the math, and just tell it what matrix you want to use. This means that we need to create that matrix, copy that matrix to the GPU and then use that matrix in the vertex shader.
In the triangle shader, I added a uniform mat4 mvp
line and then
multiplied that matrix by the position being passed in:
gl_Position = mvp * vec4(pos, 0, 1);
All very exciting so far. In the .zig
file that describe the shader
and includes the source, I also defined a struct called Uniforms.
pub const Uniforms = extern struct {
mvp: ng.Mat4,
};
Oh, Mat4, in zig? What is that? Its just:
pub const Mat4 = [16]f32;
I also wrote a bunch of functions that allow me to create a rotation matrix, translation matrix, scale matrix, and do matrix multiplication. Nothing fancy going on here with using zig vectors. Optimization can wait till later and then only if we really need it - I doubt we will, so the simple methods will be just fine for the moment.
pub fn mat4_rotate_z(a: f32) Mat4 {
const cos_angle = @cos(a);
const sin_angle = @sin(a);
return .{
cos_angle, -sin_angle, 0, 0,
sin_angle, cos_angle, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1,
};
}
To use this, I created a function called make_uniform_slots
.
const TriangleUniforms = ng.make_uniform_slots(triangle_shader.Uniforms);
Take takes a type, and returns another type. This is the thing that Zig excels in.
pub fn make_uniform_slots(comptime Uniforms: type) type {
var fields: [16]std.builtin.Type.EnumField = undefined;
var num_fields: usize = 0;
inline for (std.meta.fields(Uniforms)) |field| {
fields[num_fields].value = num_fields;
fields[num_fields].name = field.name;
num_fields += 1;
}
const info = std.builtin.Type{ .@"enum" = .{
.tag_type = u32,
.fields = fields[0..num_fields],
.decls = &.{},
.is_exhaustive = true,
} };
return @Type(info);
}
This creates an enum
type that explicitly maps the symbolic name of
the uniform that us humans can understand, like mvp
, and maps that
onto a index value that the graphics libraries would like to use. It
does this by iterating over all the fields in the Uniform type that was
passed in, using an inline for of the std.meta.fields
of that type,
extracting the name of each field, and building up the fields of the
resultant enum type. It finally creates the @Type
and returns that to
the user.
This means that we can then just:
render_pass.apply_uniform_mat4(TriangleUniforms.mvp, mvp);
Obviously, that mvp has to come from somewhere. To do that, I created a Camera structure that can be filled in:
const now: f32 = @floatCast(ng.elapsed());
camera.zoom = @sin(now / 5) + 2;
camera.rotate = @cos(now / 3);
camera.origin = .{ window_size.width / 2, window_size.height / 2 };
camera.target = .{ @sin(now * 3) * 500, @cos(now * 3) * 500 };
const view = camera.get_matrix();
const mvp = ng.mat4_mul(view, projection);
The only thing left is the projection matrix. We’ll just stick with 2D graphics for the moment, so we only really need an orthographic projection.
const window_size = window.get_size();
const projection = ng.ortho(window_size.width, window_size.height);
Another interesting thing this week was the frame rate calculation
stuff. As shown above, the rotation and zoom-ness of the triangle is
based on some elapsed time. I could have used std.time
to get this,
but I wanted to abstract that out.
For example, the std.time
functions return signed integers mostly. I
don’t care about the past and therefore would prefer to use unsigned.
pub fn wallclock_us() u64 {
return @bitCast(std.time.microTimestamp());
}
Elapsed time is also interesting, so at initialisation time, I save off the current wallclock in microseconds to a global variable.
pub fn init() void {
start_time = wallclock_us();
}
And now we can define elapsed functions, one for integer microseconds and one or floating point.
pub fn elapsed_us() u64 {
const now = wallclock_us() -% start_time;
return now;
}
pub fn elapsed() f64 {
return @as(f64, @floatFromInt(elapsed_us())) / 1e6;
}
This does show how much @builtin manipulation is necessary to go from an
unsigned 64-bit value to a floating point value. However, this operation
is not free, from the CPU coding point of view or CPU executing time
point of view, so having those @as
and @floatFromInt
calls is
reasonable.
Oh yeah, frame rate calculation.
At the start of each frame, I get the current delta_time, dt
, from the
ng.start_frame
function. This does the obvious, call elapsed_us
,
subtract off the value the last time this was called, and then return
this as a floating point value. But, whilst this gives us the time
elapsed since the last time we called start_frame, it doesn’t tell us a
reasonable fps value that us mean humans could understand. To do that, I
then pass that dt
into an update_fps
function.
var all_delta_times: [65536]f32 = undefined;
var next_dt_index: usize = 0;
fn update_fps(dt: f32) void {
all_delta_times[next_dt_index] = dt;
next_dt_index = next_dt_index + 1;
var total_ft: f32 = 0;
for (all_delta_times[0..next_dt_index]) |ft| {
total_ft += ft;
}
if (total_ft >= 1.0 or next_dt_index == all_delta_times.len) {
average_frame_rate = @intCast(next_dt_index);
next_dt_index = 0;
}
}
We have a very large array of all_delta_times
, and an index into that
array. Each time the update_fps function is called, we store the passed
in dt
into that array at the current index, and then increment that
index. We then add up the total amount of time stashed away in that
array. If that total time is greater than one second or we run out of
slots in the array, then we determine that the average frame rate is
the number of values in that array, and start again.
Essentially, if we have only 20 values in the array before they all add up to one second, then we are running at 20 frames per second. Some may be shorter in time, some may be longer in time, but on average we are exactly 20. We store this away in another global variable, and can use that to display the fps on the screen.
Oh, yeah, that probably means we need to display text somehow.
The other major implementation this week was debug text. There are two
parts, the font and the ability to draw text. I do admit to loving the
odd old 8x8 font, the resolution we have on displays these days means
that these fonts are either really blocky or really small. Neither is
good. Therefore, I decided to go on and create a 12x20 font. Not exactly
8-bit territory, but a reasonable size on todays screens. Of course, zig
being zig, I can just create an array of u12
values to store this off
in binary.
pub const debug_font = [256 * 20]u12{
...
// 0x30 '0'
0b000000000000,
0b000111111000,
0b001100001100,
0b011000000110,
0b011000000110,
0b011100000110,
0b011010000110,
0b011001000110,
0b011000100110,
0b011000010110,
0b011000001110,
0b011000000110,
0b011000000110,
0b001100001100,
0b000111111000,
0b000000000000,
0b000000000000,
0b000000000000,
0b000000000000,
0b000000000000,
Is that slash the wrong way around? It’s a debug font, you don’t have to like it. The font is therefore described in a single zig file, that over 5000 lines long. Oops.
To display this on screen, I created a very simple debug text buffer.
This is 4kb buffer that can hold any ascii character. It is cleared to
zeroes at the start of each frame, and then any code can print debug
text as they see fit. Only at the end of the frame, just before the
commit, is a buffer of vertexes generated, copied to the CPU, and then
drawn to the framebuffer. The only magic character is the carriage
return character '\n'
. This moves to the next line. Given that the
window can be resized, each frame, the current size of the screen is
used to determine the number of characters per line, and the number of
rows. There is no scrolling, although text will wrap from one line to
another. It is simple, but it works. And the first thing I implemented
was:
ng.debug_print("{} Hz\n", .{average_frame_rate});
Yes, we hook into the zig.fmt
functions from the standard library to
get all the fancy formatting we’d want.
pub fn print(comptime fmt: []const u8, args: anytype) void {
const debug_writer: void = {};
const writer = std.io.AnyWriter{ .context = &debug_writer, .writeFn = debug_text_writer };
std.fmt.format(writer, fmt, args) catch {};
}
We do this by creating an std.io.AnyWriter
that just takes in a
writeFn
that is called to write the formatted data. There is a context
that we could use, but we don’t care, so we pass in a void.
fn debug_text_writer(self: *const anyopaque, bytes: []const u8) error{}!usize {
_ = self;
puts(bytes);
return bytes.len;
}
The debug_text_writer
function just calls the debug text’s puts
function and then returns the length of the slice written. We don’t
actually care if it was written fully or not, because it could have
overflowed the screen size, as this is debug text.
The other major change this week was how events were being processed.
This came about because of how keyboards are handled in Xlib
or more
accurately, xcb
. I wanted to support the compose key on linux, so that
I can use characters like «
by pressing compose
and then <
twice.
To do that, I had to add support for XIC
, Xutf8LookupString
and
XFilterEvent
. The XFilterEvent
is the annoying one. It can return
saying ‘yeah, you know that event that you were processing, don’t, I’ve
handled it and you don’t need to do anything else at this point.’
This is all well and good, except we now have the ability for a polling of an event to return zero, one, two, or even more events. That doesn’t fit into the simple ‘one poll, one event’ model that we had previously. Therefore, I’ve created an event queue. This just holds events generated by various systems, like the video and x11 platform. We have a read and write index into a circular buffer, and a count of how many events are in the array.
pub fn send_event(ev: Event) void {
if (event_queue_count < event_queue.len) {
event_queue[event_queue_write_index] = ev;
event_queue_write_index = (event_queue_write_index + 1) % event_queue.len;
event_queue_count += 1;
}
}
The systems and platforms can call send_event
that will
add an event to this event queue.
pub fn poll_event() ?Event {
video.generate_events();
if (event_queue_count > 0) {
const ev = event_queue[event_queue_read_index];
event_queue_read_index = (event_queue_read_index + 1) % event_queue.len;
event_queue_count -= 1;
return ev;
}
return null;
}
A poll will now call the systems and platforms to generate events and then return the first event in this event queue back to the caller. I’m sure I could have done something fancier with an iterator, but this works.
I talked last week about having optional functions in the API
struct.
Well, this week, I added one of those in. Let’s have a look.
const API = struct {
...
glXSwapIntervalEXT: ?*const fn (*Display, Window, i32) callconv(.c) void,
};
The glXSwapIntervalEXT is an optional function. We don’t know if a platform will support this or not, and even if it doesn’t the world is not going to end just because it doesn’t.
To use this, we just check that it is not null before calling it.
fn set_swap_interval(a_window: video.Window, interval: video.SwapInterval) void {
if (api.glXSwapIntervalEXT) |swap_interval| {
const ext_interval: i32 = switch (interval) {
.fast => 0,
.vsync, .lowpower => 1,
.double => 2,
.adaptive => -1,
};
swap_interval(display, a_window.handle, ext_interval);
}
}
Here, we take in a window and an interval and map that onto value that
glXSwapIntervalEXT
expects. That being, 0 means go as fast as
possible, 1 being sync to the vertical blank, 2 being sync to every
other vertical block, and -1 being using adaptive syncing. Note that we
capture the function in the if (api.glXSwapIntervalEXT)
statement, and
then use that captured function pointer to call it. If that was null,
we’d never map the values and never call the function. This also means
that we could add additional methods to control swap intervals, if
necessary, in the future.
That about wraps up the major changes this week. The lines of code has increased dramatically. Two weeks in and we are at 9103 lines of code.
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Zig 16 704 793 9103
-------------------------------------------------------------------------------
But, 5000 of that was a single debug font file. Excluding that, we’ve added 1529 lines of code. As predicted, things have slowed down.