Fourth - Time, Debug Text, Uniforms

(2205 words)

It has been a bit of a week. I’ve only added about 6651 lines of code this week. Most of these lines are not code, but I’ll explain later.

First, we have to talk about uniforms. In the graphics world, the concept of a uniform is a very small bit of data that is used by a vertex or fragment shader to do a lot of work. For example, if we wanted to have a triangle rotating around the screen we could do all the math on the CPU and update the buffers that describe the triangle every frame. This is both a lot of math for a CPU to do, and also a lot of memory that has to be copied between the CPU memory subsystem and the GPU memory subsystem. These are different blocks of memory and are only connected by a slightly wet piece of string, typically called the PCIe bus by most people.

Nope, the best way to do this is have the GPU do all the math, and just tell it what matrix you want to use. This means that we need to create that matrix, copy that matrix to the GPU and then use that matrix in the vertex shader.

In the triangle shader, I added a uniform mat4 mvp line and then multiplied that matrix by the position being passed in:

gl_Position = mvp * vec4(pos, 0, 1);

All very exciting so far. In the .zig file that describe the shader and includes the source, I also defined a struct called Uniforms.

pub const Uniforms = extern struct {
    mvp: ng.Mat4,
};

Oh, Mat4, in zig? What is that? Its just:

pub const Mat4 = [16]f32;

I also wrote a bunch of functions that allow me to create a rotation matrix, translation matrix, scale matrix, and do matrix multiplication. Nothing fancy going on here with using zig vectors. Optimization can wait till later and then only if we really need it - I doubt we will, so the simple methods will be just fine for the moment.

pub fn mat4_rotate_z(a: f32) Mat4 {
    const cos_angle = @cos(a);
    const sin_angle = @sin(a);
    return .{
        cos_angle, -sin_angle, 0, 0,
        sin_angle, cos_angle,  0, 0,
        0,         0,          1, 0,
        0,         0,          0, 1,
    };
}

To use this, I created a function called make_uniform_slots.

const TriangleUniforms = ng.make_uniform_slots(triangle_shader.Uniforms);

Take takes a type, and returns another type. This is the thing that Zig excels in.

pub fn make_uniform_slots(comptime Uniforms: type) type {
    var fields: [16]std.builtin.Type.EnumField = undefined;

    var num_fields: usize = 0;

    inline for (std.meta.fields(Uniforms)) |field| {
        fields[num_fields].value = num_fields;
        fields[num_fields].name = field.name;
        num_fields += 1;
    }

    const info = std.builtin.Type{ .@"enum" = .{
        .tag_type = u32,
        .fields = fields[0..num_fields],
        .decls = &.{},
        .is_exhaustive = true,
    } };

    return @Type(info);
}

This creates an enum type that explicitly maps the symbolic name of the uniform that us humans can understand, like mvp, and maps that onto a index value that the graphics libraries would like to use. It does this by iterating over all the fields in the Uniform type that was passed in, using an inline for of the std.meta.fields of that type, extracting the name of each field, and building up the fields of the resultant enum type. It finally creates the @Type and returns that to the user.

This means that we can then just:

render_pass.apply_uniform_mat4(TriangleUniforms.mvp, mvp);

Obviously, that mvp has to come from somewhere. To do that, I created a Camera structure that can be filled in:

const now: f32 = @floatCast(ng.elapsed());
camera.zoom = @sin(now / 5) + 2;
camera.rotate = @cos(now / 3);
camera.origin = .{ window_size.width / 2, window_size.height / 2 };
camera.target = .{ @sin(now * 3) * 500, @cos(now * 3) * 500 };
const view = camera.get_matrix();
const mvp = ng.mat4_mul(view, projection);

The only thing left is the projection matrix. We’ll just stick with 2D graphics for the moment, so we only really need an orthographic projection.

const window_size = window.get_size();
const projection = ng.ortho(window_size.width, window_size.height);

Another interesting thing this week was the frame rate calculation stuff. As shown above, the rotation and zoom-ness of the triangle is based on some elapsed time. I could have used std.time to get this, but I wanted to abstract that out.

For example, the std.time functions return signed integers mostly. I don’t care about the past and therefore would prefer to use unsigned.

pub fn wallclock_us() u64 {
    return @bitCast(std.time.microTimestamp());
}

Elapsed time is also interesting, so at initialisation time, I save off the current wallclock in microseconds to a global variable.

pub fn init() void {
    start_time = wallclock_us();
}

And now we can define elapsed functions, one for integer microseconds and one or floating point.

pub fn elapsed_us() u64 {
    const now = wallclock_us() -% start_time;
    return now;
}

pub fn elapsed() f64 {
    return @as(f64, @floatFromInt(elapsed_us())) / 1e6;
}

This does show how much @builtin manipulation is necessary to go from an unsigned 64-bit value to a floating point value. However, this operation is not free, from the CPU coding point of view or CPU executing time point of view, so having those @as and @floatFromInt calls is reasonable.

Oh yeah, frame rate calculation.

At the start of each frame, I get the current delta_time, dt, from the ng.start_frame function. This does the obvious, call elapsed_us, subtract off the value the last time this was called, and then return this as a floating point value. But, whilst this gives us the time elapsed since the last time we called start_frame, it doesn’t tell us a reasonable fps value that us mean humans could understand. To do that, I then pass that dt into an update_fps function.

var all_delta_times: [65536]f32 = undefined;
var next_dt_index: usize = 0;

fn update_fps(dt: f32) void {
    all_delta_times[next_dt_index] = dt;
    next_dt_index = next_dt_index + 1;
    var total_ft: f32 = 0;
    for (all_delta_times[0..next_dt_index]) |ft| {
        total_ft += ft;
    }
    if (total_ft >= 1.0 or next_dt_index == all_delta_times.len) {
        average_frame_rate = @intCast(next_dt_index);
        next_dt_index = 0;
    }
}

We have a very large array of all_delta_times, and an index into that array. Each time the update_fps function is called, we store the passed in dt into that array at the current index, and then increment that index. We then add up the total amount of time stashed away in that array. If that total time is greater than one second or we run out of slots in the array, then we determine that the average frame rate is the number of values in that array, and start again.

Essentially, if we have only 20 values in the array before they all add up to one second, then we are running at 20 frames per second. Some may be shorter in time, some may be longer in time, but on average we are exactly 20. We store this away in another global variable, and can use that to display the fps on the screen.

Oh, yeah, that probably means we need to display text somehow.

The other major implementation this week was debug text. There are two parts, the font and the ability to draw text. I do admit to loving the odd old 8x8 font, the resolution we have on displays these days means that these fonts are either really blocky or really small. Neither is good. Therefore, I decided to go on and create a 12x20 font. Not exactly 8-bit territory, but a reasonable size on todays screens. Of course, zig being zig, I can just create an array of u12 values to store this off in binary.

pub const debug_font = [256 * 20]u12{
    ...
    // 0x30 '0'
    0b000000000000,
    0b000111111000,
    0b001100001100,
    0b011000000110,
    0b011000000110,
    0b011100000110,
    0b011010000110,
    0b011001000110,
    0b011000100110,
    0b011000010110,
    0b011000001110,
    0b011000000110,
    0b011000000110,
    0b001100001100,
    0b000111111000,
    0b000000000000,
    0b000000000000,
    0b000000000000,
    0b000000000000,
    0b000000000000,

Is that slash the wrong way around? It’s a debug font, you don’t have to like it. The font is therefore described in a single zig file, that over 5000 lines long. Oops.

To display this on screen, I created a very simple debug text buffer. This is 4kb buffer that can hold any ascii character. It is cleared to zeroes at the start of each frame, and then any code can print debug text as they see fit. Only at the end of the frame, just before the commit, is a buffer of vertexes generated, copied to the CPU, and then drawn to the framebuffer. The only magic character is the carriage return character '\n'. This moves to the next line. Given that the window can be resized, each frame, the current size of the screen is used to determine the number of characters per line, and the number of rows. There is no scrolling, although text will wrap from one line to another. It is simple, but it works. And the first thing I implemented was:

ng.debug_print("{} Hz\n", .{average_frame_rate});

Yes, we hook into the zig.fmt functions from the standard library to get all the fancy formatting we’d want.

pub fn print(comptime fmt: []const u8, args: anytype) void {
    const debug_writer: void = {};
    const writer = std.io.AnyWriter{ .context = &debug_writer, .writeFn = debug_text_writer };
    std.fmt.format(writer, fmt, args) catch {};
}

We do this by creating an std.io.AnyWriter that just takes in a writeFn that is called to write the formatted data. There is a context that we could use, but we don’t care, so we pass in a void.

fn debug_text_writer(self: *const anyopaque, bytes: []const u8) error{}!usize {
    _ = self;
    puts(bytes);
    return bytes.len;
}

The debug_text_writer function just calls the debug text’s puts function and then returns the length of the slice written. We don’t actually care if it was written fully or not, because it could have overflowed the screen size, as this is debug text.

The other major change this week was how events were being processed. This came about because of how keyboards are handled in Xlib or more accurately, xcb. I wanted to support the compose key on linux, so that I can use characters like « by pressing compose and then < twice. To do that, I had to add support for XIC, Xutf8LookupString and XFilterEvent. The XFilterEvent is the annoying one. It can return saying ‘yeah, you know that event that you were processing, don’t, I’ve handled it and you don’t need to do anything else at this point.’

This is all well and good, except we now have the ability for a polling of an event to return zero, one, two, or even more events. That doesn’t fit into the simple ‘one poll, one event’ model that we had previously. Therefore, I’ve created an event queue. This just holds events generated by various systems, like the video and x11 platform. We have a read and write index into a circular buffer, and a count of how many events are in the array.

pub fn send_event(ev: Event) void {
    if (event_queue_count < event_queue.len) {
        event_queue[event_queue_write_index] = ev;
        event_queue_write_index = (event_queue_write_index + 1) % event_queue.len;
        event_queue_count += 1;
    }
}

The systems and platforms can call send_event that will add an event to this event queue.

pub fn poll_event() ?Event {
    video.generate_events();

    if (event_queue_count > 0) {
        const ev = event_queue[event_queue_read_index];
        event_queue_read_index = (event_queue_read_index + 1) % event_queue.len;
        event_queue_count -= 1;
        return ev;
    }
    return null;
}

A poll will now call the systems and platforms to generate events and then return the first event in this event queue back to the caller. I’m sure I could have done something fancier with an iterator, but this works.

I talked last week about having optional functions in the API struct. Well, this week, I added one of those in. Let’s have a look.

const API = struct {
    ...
    glXSwapIntervalEXT: ?*const fn (*Display, Window, i32) callconv(.c) void,
};

The glXSwapIntervalEXT is an optional function. We don’t know if a platform will support this or not, and even if it doesn’t the world is not going to end just because it doesn’t.

To use this, we just check that it is not null before calling it.

fn set_swap_interval(a_window: video.Window, interval: video.SwapInterval) void {
    if (api.glXSwapIntervalEXT) |swap_interval| {
        const ext_interval: i32 = switch (interval) {
            .fast => 0,
            .vsync, .lowpower => 1,
            .double => 2,
            .adaptive => -1,
        };
        swap_interval(display, a_window.handle, ext_interval);
    }
}

Here, we take in a window and an interval and map that onto value that glXSwapIntervalEXT expects. That being, 0 means go as fast as possible, 1 being sync to the vertical blank, 2 being sync to every other vertical block, and -1 being using adaptive syncing. Note that we capture the function in the if (api.glXSwapIntervalEXT) statement, and then use that captured function pointer to call it. If that was null, we’d never map the values and never call the function. This also means that we could add additional methods to control swap intervals, if necessary, in the future.

That about wraps up the major changes this week. The lines of code has increased dramatically. Two weeks in and we are at 9103 lines of code.

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Zig                             16            704            793           9103
-------------------------------------------------------------------------------

But, 5000 of that was a single debug font file. Excluding that, we’ve added 1529 lines of code. As predicted, things have slowed down.

Fourth, Game