Fourth - ECS initial implementation
(3132 words)When writing software I like to do an implementation and then if we need to optimize it later, then we do that later. You don’t write software quickly if the first thing that you are doing is optimise first. When you hear the letters ‘ECS’ many people think about data oriented design and highly optimized processing of data. Yes, this is one possible reason for doing ECS, but it is not the primary reason that I’m using this design pattern.
What is an ECS? Well, the letters stand for Entity, Component, System. An Entity is just a thing, an object, an actor, something that is unique. A Component is a bit of data. A system is a bit of code that processes entities based on the components that they currently have.
This means that an Entity has Components. It is not the object-oriented ‘is-a’ relationship, but a database ‘has-a’ relationship. Infact, I like to think of ECS more like a database than a object relationship management.
Let’s go back to some first principles. In a game, we have one or more entities. Each entity is something that is significant in the game. It could be a building, or a road, or a person, or some goods. It could be a process that takes one or more goods, and one or more people with skills, and converts them into one or more other goods. It could be anything. Some people will take this to the absolute extreme, modelling user interfaces or anything else as entities. I’m not an absolutist. The user interface stuff is completely separate from the ECS. My simple rule will be, if it needs to be saved in a game save file then it should be in the ECS database, otherwise it should not be.
What is an Entity? Yes, we’ve asked that before, but here is the engineering answer. It is a 32-bit integer. The actual definition uses a packed struct in zig.
pub const Entity = packed struct(u32) {
gen: EntityGeneration,
idx: EntityIndex,
pub const null_entity: Entity = .{ .idx = 0xFFFFFF, .gen = 0xFF };
...
};
The EntityIndex
is just a u24
that identifies each individual entity
in the system. It is used as the key into various hash maps, or as an
index into arrays as necessary. EntityGeneration
is a u8
. There is
also a null_entity
declaration that defines a single identifier that
denotes that this entity is null.
What is the generation bit for? Well, this solves many of the problems that pointers have. With a pointer, it is always valid, even if the memory that it points to is free’d and perhaps even later reallocated to something else. This is why ‘safe languages’ don’t allow programmers to use pointers. We will solve this not by using a garbage collector, or reference counting, or some borrow checker, but by using a generation number. To create a new entity, we just allocate the next available entity index, and give it a generation zero.
const index: EntityIndex = @intCast(generations.items.len);
generations.append(allocator, 0);
const self = Entity{ .gen = 0, .idx = index };
Note: error handling has been removed to increase clarity above and also in many of the examples below.
As you can see above, we know what the next entity to allocate is using
a generations ArrayList
. However, this only allows us to allocate
16777215 possible entities. That’s a lot. But it is not a lot if every
bullet that you fire or smoke particle burns through these entities. How
do we recycle them? Well, when an entity is deleted, it is added to a
recycle list.
generations.items[idx] = gen +% 1;
recycled.append(allocator, idx);
The variable recycled
is just another ArrayList
that holds entity
indexes of any entity that has been deleted. Before we do that, we also
increment the value of the generation number in the generations array.
Note, we use the +%
operator in zig that allows overflow from 255 to
0. Then, when we create a new entity, we first check if the recycled
array has anything on it by trying to pop something off. If we get an
index, then we grab the generation number from the generations array and
return the entity with the correct values.
if (recycled.pop()) |idx| {
const gen = generations.items[idx];
const self = Entity{ .gen = gen, .idx = idx };
entity_changes.put(allocator, self, {});
return self;
}
The basic result of this is that the first time we create an entity, we are given the value 0:0, where the first number is the generation number and the second number is the index. If we delete this, and then create another entity, we’ll get the value 1:0. It is still using the same index, but the generation number has been incremented. This means that it is really easy to check if a given entity is valid or not.
pub fn is_valid(self: Entity) bool {
if (self.idx >= generations.items.len) {
return false;
}
if (generations.items[self.idx] != self.gen) {
return false;
}
return true;
}
First, we check if the index of the entity is not greater than the maximum number of entities we’ve ever created by checking against the number of items in the generations array. Then we check that the generation number in that generations array at the entities index is the same as this entities generation number, and only then do we return true.
I’m sure you’re already ahead of me now. What happens if we create a new entity and then delete it say, 256 times, what is the entity identifier of the next entity that is created? Well, without anything fancy, we’d probably give back an entity with the same index and a generation number that has already been used. This may not be great. This may be the whole pointer reuse problem again. We could reduce the instance of this by increasing the size of the generation number, or we could ‘fix’ it with code.
What we actually do is have yet another array called old_recycled
that
is just like the recycled array but when the generation number gets to
255, instead of adding the index to recycled we add it to old_recycled.
generations.items[idx] = gen +% 1;
if (generations.items[idx] == 255) {
old_recycled.append(allocator, idx);
} else {
recycled.append(allocator, idx);
}
This means that if you new/delete a single entity, you’ll start with entity identifies 0:0, then get 1:0, 2:0, 3:0, all the way up to 253:0 and then 254:0 and then you’ll get 0:1. The generations value at index 0 is set to 255, meaning that no entity identifier with an index 0 is now considered value, and we’ve starting creating entities with the index 1. This essentially means that we can create and delete also 2 billion entities before we have problems. However, problems will still happen.
fn recycle_old_generations() void {
while (old_recycled.pop()) |idx| {
const gen = generations.items[idx];
generations.items[idx] = gen +% 1;
recycled.append(allocator, idx);
}
}
To resolve this, there is a function called recycle_old_generations. This iterates through all the old_recycled values, popping them off one at a time, and then incrementing the generation value one or more and adding these values back to the recycled array. This should not be called every frame, but it would probably be ok to call this at a suitable break, say between levels, or just before saving.
Hopefully, you understand what an entity is, but just creating entities is not very useful if they can’t also have data. This is where Components come in.
A Component is just a bit of plain old data. In zig, that is a struct. For example, say you want to have an entity that has a position, well, you can just do:
const Position = struct { x: f32, y: f32 };
const entity_one = ecs.new ();
entity_one.set (Position {.x = 2, .y = 3});
However, there is a small step that is required before calling the set
function on the entity. We need to register the component. To register
the component, we just call the register_component
function. This
takes two arguments, the name of the component and the type of the
component. Yes, its that fun time in Zig where we pass types as
parameters at compile time.
pub fn register_component(comptime name: []const u8, comptime Component: type) void {
The first thing we do is get a unique identifier for the Component. We
call this a TypeId
, but essentially it is just a usize
.
const type_id = get_type_id(Component);
Zig doesn’t have any runtime type identifiers but we can build it ourselves. This is as simple as doing:
fn get_type_id(comptime Component: type) TypeId {
return @intFromPtr(&(struct {
var item: u8 = @intCast(@sizeOf(Component));
}).item);
}
This is a rather complicated function, so let’s walk through it step by
step. We’re going to return a integer (TypeId is just a usize remember),
and that integer is cast from a pointer using @intFromPtr
. The pointer
in question is the address of an item of a struct that is defined at
compile time in this function. The item itself is just a variable of
that struct that is an 8-bit value that has the value of the size of the
Component. We really don’t use that size at this point, and probably
never will, but it is free to call this at compile time and we need to
have some code that will force this struct to be instantiated so that we
can takes its address and convert that to an integer to pass back as a
TypeId. Yes, it is confusing the first couple of times you look at it.
Next, we want to build up a ComponentInfo
structure that defines
everything the ecs will use to understand components. This includes the
name, the size of the component, as well as the information about the
fields in this Component and their types. There is a handy function
called std.meta.fields
that helps here. We can iterate over all the
Component fields using an inline for
which is a for loop that is done
at compile time. We can then match the actual type of the field with our
own internal type enum and fill in the information for the field in the
info structure. That information will become useful when we want to do
introspection, either in the user interface or during saving and
loading. We can find other information about this field using the
@offsetOf
and @sizeOf
builtins. The final thing we need is someplace
to store component data.
We store component data in an ArrayHashMap
. The actual component
storage is done using yet another zig comptime function that takes a
type and returns another type: ComponentStorage
.
pub fn ComponentStorage(Component: type) type {
return struct {
store: std.AutoArrayHashMapUnmanaged(EntityIndex, Component) = .empty,
const Self = @This();
pub const empty: Self = .{ .store = .empty };
pub fn deinit(self: *Self) void {
self.store.deinit(allocator);
}
}
}
As you can see, we are just mapping an EntityIndex into the value of a
Component. We can then define any number of methods on this struct, like
deinit
to free up the memory required. For example, the function get
is very simple.
pub fn get(self: *Self, key: Entity) ?Component {
return self.store.get(key.idx);
}
However, there is a problem. A rather big problem. Some might even call it an elephant in the room sized problem. When we are trying to set an entity with a component, we need to be able to call the get function in the component storage or that particular component. However, we can’t key hash maps with a type, only a usize or something similar. Something like the TypeId we found above. Therefore, we have to do some jumping around to make this work.
pub fn get(self: Entity, comptime Component: type) ?Component {
if (!self.is_valid()) {
log.err("get invalid entity {}", .{self});
return null;
}
const typeid = get_type_id(Component);
if (components.getPtr(typeid)) |info| {
var storage = info.storage.cast(Component);
return storage.get(self);
} else {
log.err("Component {s} not registered", .{@typeName(Component)});
}
return null;
}
As shown above, the entity get
method, first checks if the entity is
valid and returns null
if it is not. Then we find the typeid using the
get_type_id function. This is using a comptime value, so that ‘function
call’ is really just a setting of a hard coded value. We then call grab
the ComponentInfo
from the components array, created during the
register_component
function. from the component info, we grab the
storgae and call the get
function on the storage. Except there is a
cast
function in there. What is that?
pub fn cast(
self: ErasedComponentStorage,
Component: type,
) *ComponentStorage(Component) {
return @alignCast(@ptrCast(self.ptr));
}
This is the function. It takes something called an
ErasedComponentStorage
and a component type, and returns a pointer to
the ComponentStorage
for that component. The function body is really
just a @ptrCast
. It is really simple. Except you should now be asking
what the ErasedComponentStorage
is?
pub const ErasedComponentStorage = struct {
ptr: *anyopaque,
deinit: *const fn (self: ErasedComponentStorage) void,
pub fn cast(
self: ErasedComponentStorage,
Component: type,
) *ComponentStorage(Component) {
return @alignCast(@ptrCast(self.ptr));
}
};
Here is the complete structure definition. It is a zig structure that
has a ptr
to something, and a function pointer called deinit
.
Confused yet? The key to how this works is how this is initialized in
the first place.
fn init_erased_component_storage(Component: type) ErasedComponentStorage {
const ptr = allocator.create(ComponentStorage(Component)) catch unreachable;
ptr.* = .empty;
return ErasedComponentStorage{
.ptr = ptr,
.deinit = (struct {
fn deinit(self: ErasedComponentStorage) void {
const cast_ptr = self.cast(Component);
cast_ptr.deinit();
allocator.destroy(cast_ptr);
}
}).deinit,
};
}
The init_erased_component_storage
function takes a type and returns an
ErasedComponentStorage
. It does this by first creating an instance of
the ComponentStorage(Component)
and initialising it to .empty
. We
not have a pointer to the component storage for the given Component. We
can then return this back to the caller in the ErasedComponentStorage
along with an implementation of the deinit
function.
This is a little more zig trickery. We create an anonymous structure
that contains a single method that takes an erased component storage,
casts it to the ComponentStorage(Component)
using the cast funciton,
and then calling deinit
on that component storage, before destroying
the instance itself. The tricky bit is that the value Component
is a
comptime value of the init_erased_component_storage
and therefore a
different embodiment of the deinit
function will be created for each
new type that is used in this init function. And each of those knows how
to cast an opaque pointer to the exact type for that specific type.
And by the magic of comptime, we’ve got a way to go from a typeid
to
the storage that is used to hold values of the Component in a fully
type-safe manner.
For the entity set
function, for example, it is as simple as:
pub fn set(self: Entity, value: anytype) void {
const Component = @TypeOf(value);
if (!self.is_valid()) {
log.err("set invalid entity {}", .{self});
return;
}
const typeid = get_type_id(Component);
if (components.getPtr(typeid)) |info| {
var storage = info.storage.cast(Component);
storage.set(self, value);
}
}
Note, that we don’t explicitly pass in the type of the value, but we can
determine this at compile time using the @TypeOf
builtin.
The final part of ECS is the system. How does that work?
We first have to register a system using the register_system
function.
This takes in some options, like the name of the system, a function that
will be called when the system is run, and a set of Components that an
entity must have to be included in this system. Obviously, the
components that are passed in are first converted into TypeId
s using
the get_type_id
function and all the information about the system is
stored in a SystemInfo
struct that is stored in a systems
array.
One interesting bit about systems is that they have a phase
. This is
essentially an ordering of similar systems so that systems that need to
run before others will have an earlier phase. This also means that
systems can be sorted on their phase to determine the order of
execution. Although this is not implemented now, all systems within a
single phase could be executed in parallel. At the moment, we are
strictly single threaded, purely because we don’t need to over optimize
at the moment.
System functions take in an SystemIterator that contains two useful bits
of information, the delta_time
from the last time this system was
called, and a slice of entities
that this system should operate on.
The system functions themselves can be very simple:
fn movement_system(iter: *const SystemIterator) void {
for (iter.entities) |entity| {
if (entity.getPtr(Position)) |pos| {
if (entity.get(Velocity)) |vel| {
pos.x += vel.dx * iter.delta_time;
pos.y += vel.dy * iter.delta_time;
}
}
}
}
The above code just iterates over the entities slice, and for each
entity, get’s a mutable Position
as pos
, a constant Velocity
as
vel
and then updates the position based on the velocity and the delta
time.
The set of entities that a system operates on is held in the SystemInfo
within an ArrayHashMap(Entity,void)
. This hash map has a type for the
key but nothing for the value. This essentially means that this is
operating as a set, but one that can be accessed using an array. All
very simple and efficient. But this set of entities has to be
maintained. Every time an entity has a new Component set to it, the set
of systems that need to know about this entity might need to be updated.
To accommodate this, there is an entity_changes
set, another
ArrayHasMap(Entity,void)
is used. Each time the set of Components
changes on an entity, the entity identifier is added to this set. Once
all the systems are run, each of the entities within this set is checked
against the list of registered systems, and the component typeids that
they care about, and the entity is either added or removed from each of
these system entities sets. This is done only after all the systems are
run to reduce the complexities when iterating thought all the systems
and all the entities within those systems.
That is a very quick overview of the ECS system. There are other things
in the module that might be interesting. For example, there is a concept
known as staged
that used when systems are run where all writes to the
database, which is not through a mutable update of data like in the
movement_system
shown above, are done by writing the data to be
written into another array and performing all these writes after all the
systems are executed. This essentially means that system execution can
consider the database to be read only, except where the system arguments
are explicitly marked as mutable. This also means that we can check this
mutablility at runtime in the entity getPtr
function.