ECS performance
(2705 words)Yes, premature optimization is the killer of all good programming projects, but it’s my project and therefore I can do what I want!
Firstly, we need some way to measure the performance of the system, because without knowing how things have improved it is mostly impossible to know if any performance improvements have worked. This is just a simple case of creating some benchmark tests.
Some interesting things to note here. The first run of any test will be mostly pointless as we would be more likely to be measuring the memory bandwidth of the code execution path and not the actual speed of any code we are benchmarking. We also want to do things more than once, because then any variation in what the system is doing, for example me typing on the keyboard in neovim could impact on the benchmarking, so we run it a number of times and find the iteration that runs the quickest. No, we don’t care about averages in these sort of tests, we just want to remove as many variables are possible like being task switched.
First test:
We have a Position and Velocity structs that just use x
and y
fields
that are f32
, and a movement system that just gets the position and
velocity and updates then.
const Position = struct { x: f32, y: f32 };
const Velocity = struct { dx: f32, dy: f32 };
pub fn movement(iter: *const ng.SystemIterator) void {
const dt = iter.delta_time;
for (iter.entities) |entity| {
if (entity.getPtr(Position)) |position| {
if (entity.get(Velocity)) |velocity| {
position.x += velocity.dx * dt;
position.y += velocity.dy * dt;
}
}
}
}
This takes 637 ns per entity to update things. That’s, err, rather slow. Of course it is, it is doing a couple of hash table lookups for each entity and also checking that the component is mutable according to the system definition.
However, before we do anything radical, let’s just try compiling the
tests with ReleaseFast
rather than Debug
mode. Ok, a lot better, but
still slow. 19.4 ns per entity.
What if we used a vector Vec2
instead of individual struct fields?
const Position = struct { pos: ng.Vec2 };
const Velocity = struct { vel: ng.Vec2 };
Because we want to multiple the delta time with the velocity in a single
operation, we @splat
the single f32 into a Vec2
.
pub fn movement(iter: *const ng.SystemIterator) void {
const dt: ng.Vec2 = @splat(iter.delta_time);
for (iter.entities) |entity| {
if (entity.getPtr(Position)) |position| {
if (entity.get(Velocity)) |velocity| {
position.pos += velocity.vel * dt;
}
}
}
}
This has made no different to the speed with the same 19.4 ns per entity. Why didn’t this make things any better?
Looking at the code produced might be interesting. When using the explicit x, y fields we have:
6cd53: c5 fb 10 04 c2 vmovsd (%rdx,%rax,8),%xmm0
position.x += velocity.dx * dt;
position.y += velocity.dy * dt;
6cd58: c5 f8 59 85 30 ff ff vmulps -0xd0(%rbp),%xmm0,%xmm0
6cd5f: ff
6cd60: 48 8b 85 70 ff ff ff mov -0x90(%rbp),%rax
6cd67: c4 c1 7b 10 4c c5 00 vmovsd 0x0(%r13,%rax,8),%xmm1
6cd6e: c5 f0 58 c0 vaddps %xmm0,%xmm1,%xmm0
6cd72: c4 c1 78 13 44 c5 00 vmovlps %xmm0,0x0(%r13,%rax,8)
When we use Vec2
, we have:
6ba1b: c5 fb 10 04 c8 vmovsd (%rax,%rcx,8),%xmm0
position.pos += velocity.vel * dt;
6ba20: c5 f8 59 85 30 ff ff vmulps -0xd0(%rbp),%xmm0,%xmm0
6ba27: ff
6ba28: 48 8b 85 70 ff ff ff mov -0x90(%rbp),%rax
6ba2f: c4 c1 7b 10 4c c5 00 vmovsd 0x0(%r13,%rax,8),%xmm1
6ba36: c5 f8 58 c1 vaddps %xmm1,%xmm0,%xmm0
6ba3a: c4 c1 78 13 44 c5 00 vmovlps %xmm0,0x0(%r13,%rax,8)
Basically, its the same. They are both moving the velocity into the
xmm0
register, multiplying it by the stack variable (dt), and then
adding it the position. Optimizing compilers are wonderful sometimes.
How can we improve this? Well, within the inner loop we are calling
getPtr
and get
, and those are calling into an AutoArrayHashMap. The
problem of course is that we know that out EntityIndex
is just a u24
and can therefore just be used as the ‘hash’ directly rather than using
some complex and time consuming hash function.
const ComponentContext = struct {
pub fn hash(_: ComponentContext, key: EntityIndex) u32 {
return key;
}
pub fn eql(
_: ComponentContext,
a: EntityIndex,
b: EntityIndex,
_: usize,
) bool {
return a == b;
}
};
First, we define our own ComponentContext
struct that we can pass into
the ArrayHashMapUnmanaged
function.
store: std.ArrayHashMapUnmanaged(
EntityIndex,
Component,
ComponentContext,
false,
) = .empty,
Then we can just use that ComponentContext
, and the value false
stating that we are not going to be storing the hash as a separate block
of memory and rather just use the eql
function defined above from the
key
table.
Ok, that improved things to about 13.5 ns. Not bad, but I’m sure we could do better. The next obvious step would be to go to use archetype data storage, but I think that is a step too far at this point in the project. Yes, it would be fun to implement and probably faster, but it is not a five-minute job. If an optimization can’t be done in five minutes and can’t provide a 25% improvement then don’t do it, yet. We’ve gone from 19.5 ns to 13.5 ns, and that’s more than a 25% improvement.
The other little improvement that I’ve made is the saving of the ECS
database. I was playing around with things and all of a sudden the
program started crashing. I quickly isolated it to the autosave system,
which was calling the ecs save function. The basic problem was that I
was creating an interned string store using one allocator and then
resizing it using another one. This obviously caused some problems. To
resolve this, I placed all the saving functionality into a single
SaveMemory
struct that used a single allocator for the save data,
called memory
, and the string intern store, called strings
. This
meant that the save function just became:
pub fn save(alloc: std.mem.Allocator) ![]u8 {
var memory = SaveMemory.init(alloc);
const head_offset = try memory.write_header("HEAD");
try memory.write(u32, 0);
inline for (std.meta.fields(ComponentFieldKind)) |field| {
try memory.write_int(field.value);
try memory.write_string(field.name);
}
try memory.write_header_length(head_offset);
try save_component_info(&memory);
try save_entity_info(&memory);
try save_recycled(&memory);
const strings_offset = try memory.save_strings();
try memory.write_u32_at(head_offset + 8, strings_offset);
return memory.toOwnedSlice();
}
This not only cleaned up the implementation a lot, it also kept all
allocations for the save in a single allocator that had the lifetime of
this function, at least up until the toOwnedSlice
call.
Time to move onto something much more important… User Interface functionality. Last time we touched the UI system, we could display windows and do some simple layout of things inside that window, but we could not interact with those things. We couldn’t push buttons.
Now we can. Firstly, we have reorganized the UI objects into their own
separate zig files. This means we have a Window.zig
file, and a
Button.zig
file, etc. The functionality for each UI object can then be
placed into these files. However, before we could get there, we had to
walk the tree of UI objects to find which object to send events to. The
following code was used:
var iter = self.children();
while (iter.next()) |child| {
const obj = get(child) catch continue;
if (@reduce(.And, last_mouse > obj.pos) and
@reduce(.And, last_mouse < obj.pos + obj.size))
{
if (obj.process_event(child, event)) {
return true;
}
}
}
This iterates through each child of an object, checking that the last mouse position is within the object and then processing the event if necessary. This is done recursively until we find the deepest child in the tree that contains the event. If we find no children, then either we are the deepest child in the tree or all our children are not covered by the mouse, and then we call the process event function for the type of this object.
switch (self.data) {
.window => |*win| return win.process_event(handle, event),
.button => |*button| return button.process_event(handle, event),
.vbox => |*vbox| return vbox.process_event(handle, event),
.hbox => |*hbox| return hbox.process_event(handle, event),
.text => |*text| return text.process_event(handle, event),
}
As we add more UI object types, this switch statement will become larger. Now, if these functions return true, then the event processing is stopped. In other words, the event has been consumed by the object. If not, then we return back with false, which causes the above loop to not return and the parent will process the event. Effectively this means that we can process events on the way down the tree and if a child doesn’t process an event the parents can on the way back up again. If nothing processes the event, then we can give that back to the application, today that would be the map movement interaction.
The functionality for a button can then just be added. This was really simple.
if (ui.find_object(parent, ident)) |handle| {
var object = ui.get(handle) catch return null;
object.active = true;
ui.move_child_last(parent, handle);
ui.push_build_stack(handle);
if (object.data.button.clicked > 0) {
const clicked = object.data.button.clicked;
object.data.button.clicked = 0;
return clicked;
} else {
return null;
}
In the begin_button function, we first check if an object with the same
ident
was registered in the last frame. If it was, then we can update
that object. First we move that object to be the last child in the
parent, keeping the order of the children to the calling order. Then we
push this object onto the build stack to allow children of this object
to be created, such as the text object within the button.
We also get the number of times the button has been clicked in the last
frame and return that, or null. Why not return a bool
? Well, if you
are running at a very low frame rate it is possible to miss button
presses if you happen to very quickly click multiple times on a button.
Therefore, accumulating the number of clicks and returning that will
allow these clicks to be processed.
If the button was not found, then we have to create one.
} else {
const handle = ui.new() catch |err| {
log.err("button {}", .{err});
return null;
};
const object = ui.get(handle) catch return null;
object.* = .{
.ident = ident,
.active = true,
.data = .{
.button = .{
.min_size = .{ options.width, options.height },
},
},
};
ui.add_child_last(parent, handle);
ui.push_build_stack(handle);
}
How do we process events? Well, remember above where we walked the tree of objects and found the deepest child that the mouse was within. Well, given that, we don’t really care about the position of the mouse, just that we were called.
pub fn process_event(self: *Button, handle: Handle, event: ng.Event) bool {
switch (event) {
.mouse_down => |ev| return self.process_mouse_down(handle, ev),
.mouse_double_click => |ev| return self.process_mouse_down(handle, ev),
.mouse_up => |ev| return self.process_mouse_up(handle, ev),
else => {},
}
return false;
}
We only really care about mouse down, double clicks, and up events. Everything, at this time, is ignored by returning false.
fn process_mouse_down(self: *Button, handle: Handle, event: ng.MouseEvent) bool {
if (event.button == .left) {
self.clicked += 1;
self.pressed = true;
ui.captured_mouse = handle;
}
return true;
}
The process_mouse_down
function just checks we are pressing the
.left
button, and then increments the clicked
value, remember that
it is pressed and then capture the mouse.
fn process_mouse_up(self: *Button, _: Handle, event: ng.MouseEvent) bool {
if (event.button == .left) {
self.pressed = false;
ui.captured_mouse = null;
}
return true;
}
The process_mouse_up
function reverts this. What is the pressed
value used for?
pub fn draw(self: Button, obj: *const Object) void {
if (self.pressed) {
ui.draw_rectangle(obj.pos, obj.size, self.pressed_color);
} else {
ui.draw_rectangle(obj.pos, obj.size, self.background_color);
}
}
Well, if the button is pressed then we change the background color of
the button. This does leave one question open. How do we not interact
with the Text
object that is inside the button?
pub fn process_event(_: *Text, _: Handle, _: ng.Event) bool {
return false;
}
For the Text
object, we never process events. The text is just a
label. It cannot be clicked, it cannot be moved, it cannot be scrolled.
It always returns false to any event.
To test this, I wrote a little bit of code that changed what was displayed in a debug window based on the number of button clicks and also based on time.
ng.ui_add_text(.{}, "{} frames", .{state.frame_counter});
ng.ui_add_text(.{}, "{} clicks", .{state.button_clicks});
// ng.ui.end_hbox ();
if (ng.ui_add_button(.{
.text = "Button",
.width = 100,
.height = 40,
.padding = .{ .top = 8, .left = 8, .right = 8, .bottom = 8 },
})) |count| {
state.button_clicks += count;
}
if (state.button_clicks > 0 and state.button_clicks < 30) {
ng.ui_add_text(.{}, "abcdefghijklmnopqrstuvwxyz", .{});
}
if (state.button_clicks > 10) ng.ui_add_text(.{}, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", .{});
if (state.button_clicks > 20) ng.ui_add_text(.{}, "0123456789", .{});
if (debug_state) {
ng.ui_add_text(.{}, "State On {d:0.3}", .{debug_timer});
} else {
ng.ui_add_text(.{}, "State Off {d:0.3}", .{debug_timer});
}
debug_timer -= state.dt;
if (debug_timer < 0) {
debug_timer += 1;
debug_state = !debug_state;
}
This displays the number of frames displayed, the number of button
clicks. It then adds a button, with the imaginative label of “Button”.
If that returns a value, then we increment the state.button_clicks
by
that amount. And then based on that value, we display some static text.
We also have one of either a State On
or a State Off
message based
off the delta_time of the frame.
This raised one little issue. Each time we add a new text item, and
then later not add it, the UI object is kept around. This leaks memory,
and we don’t want to leak memory. To resolve this, I changed the UI’s
Handle
to be a packed struct just like in the ECS system:
pub const Handle = packed struct(u32) {
idx: HandleIndex,
gen: HandleGeneration,
And created a generations
and recycled
arrays to manage these. This means
that we can recycle the UI object handles just like we do with an ECS entity
identifier. We can also delete them. Deletion in the UI is implicit. For each
object each frame, we keep a note of whether the object has been added again.
var iter = ui_walk(window);
while (iter.next()) |handle| {
const obj = get(handle) catch return;
if (obj.active) {
obj.shown = obj.active;
obj.active = false;
} else {
obj.shown = false;
}
}
If it was not, then we will find all of these in a new function called
delete_not_shown_in
. This iterates over all the children of an object
and removes any that were not shown. This only works within windows, not
for windows. This means that the position of windows, when reopened,
will stay the same.
object = first_window;
while (object) |handle| {
const obj = get(handle) catch return;
object = obj.succ;
delete_not_shown_in(handle);
}
fn delete_not_shown_in(phandle: Handle) void {
const parent = get(phandle) catch return;
var child: ?Handle = parent.first_child;
while (child) |chandle| {
const obj = get(chandle) catch return;
child = obj.succ;
if (obj.first_child) |_| {
delete_not_shown_in(chandle);
}
if (obj.shown == false) {
remove_child(phandle, chandle);
delete(chandle);
}
}
}
The remove_child
function is a very simple function that unlinks the
child from the parents children linked list. It then calls delete
on
the child handle. This does the obvious thing, especially if you
remember ECS blog.
used_objects.items[handle.idx] = false;
generations.items[handle.idx] +%= 1;
recycled.append(allocator, handle.idx) catch return;
Obviously, the new
function was changed to pop
indexes from the
recycled
array and return that rather than create a new UI object. To
help with this, we have a used_objects
array that denotes if an object
is valid or not.
if (recycled.pop()) |index| {
const gen = generations.items[index];
used_objects.items[index] = true;
return .{ .idx = index, .gen = gen };
}
To test all of this, I hooked up the tab
key to a temporary variable
called slow_frame_rate
and called sleep(0.5)
if it was true. This
allowed me to test the multiple clicks, and the multiple things added or
removed per frame with ease.
We are now up to 13912 lines of code.
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Zig 29 1714 1519 13912
-------------------------------------------------------------------------------
Next week is going to be drawing some map stuff. We have nodes and links in the ECS system, and I think it is time to start drawing these and maybe even interact with them.