Fourth - ECS initial implementation

(3132 words)

When writing software I like to do an implementation and then if we need to optimize it later, then we do that later. You don’t write software quickly if the first thing that you are doing is optimise first. When you hear the letters ‘ECS’ many people think about data oriented design and highly optimized processing of data. Yes, this is one possible reason for doing ECS, but it is not the primary reason that I’m using this design pattern.

What is an ECS? Well, the letters stand for Entity, Component, System. An Entity is just a thing, an object, an actor, something that is unique. A Component is a bit of data. A system is a bit of code that processes entities based on the components that they currently have.

This means that an Entity has Components. It is not the object-oriented ‘is-a’ relationship, but a database ‘has-a’ relationship. Infact, I like to think of ECS more like a database than a object relationship management.

Let’s go back to some first principles. In a game, we have one or more entities. Each entity is something that is significant in the game. It could be a building, or a road, or a person, or some goods. It could be a process that takes one or more goods, and one or more people with skills, and converts them into one or more other goods. It could be anything. Some people will take this to the absolute extreme, modelling user interfaces or anything else as entities. I’m not an absolutist. The user interface stuff is completely separate from the ECS. My simple rule will be, if it needs to be saved in a game save file then it should be in the ECS database, otherwise it should not be.

What is an Entity? Yes, we’ve asked that before, but here is the engineering answer. It is a 32-bit integer. The actual definition uses a packed struct in zig.

pub const Entity = packed struct(u32) {
    gen: EntityGeneration,
    idx: EntityIndex,

    pub const null_entity: Entity = .{ .idx = 0xFFFFFF, .gen = 0xFF };
    ...
};

The EntityIndex is just a u24 that identifies each individual entity in the system. It is used as the key into various hash maps, or as an index into arrays as necessary. EntityGeneration is a u8. There is also a null_entity declaration that defines a single identifier that denotes that this entity is null.

What is the generation bit for? Well, this solves many of the problems that pointers have. With a pointer, it is always valid, even if the memory that it points to is free’d and perhaps even later reallocated to something else. This is why ‘safe languages’ don’t allow programmers to use pointers. We will solve this not by using a garbage collector, or reference counting, or some borrow checker, but by using a generation number. To create a new entity, we just allocate the next available entity index, and give it a generation zero.

const index: EntityIndex = @intCast(generations.items.len);
generations.append(allocator, 0);

const self = Entity{ .gen = 0, .idx = index };

Note: error handling has been removed to increase clarity above and also in many of the examples below.

As you can see above, we know what the next entity to allocate is using a generations ArrayList. However, this only allows us to allocate 16777215 possible entities. That’s a lot. But it is not a lot if every bullet that you fire or smoke particle burns through these entities. How do we recycle them? Well, when an entity is deleted, it is added to a recycle list.

generations.items[idx] = gen +% 1;
recycled.append(allocator, idx);

The variable recycled is just another ArrayList that holds entity indexes of any entity that has been deleted. Before we do that, we also increment the value of the generation number in the generations array. Note, we use the +% operator in zig that allows overflow from 255 to 0. Then, when we create a new entity, we first check if the recycled array has anything on it by trying to pop something off. If we get an index, then we grab the generation number from the generations array and return the entity with the correct values.

if (recycled.pop()) |idx| {
    const gen = generations.items[idx];
    const self = Entity{ .gen = gen, .idx = idx };
    entity_changes.put(allocator, self, {});
    return self;
}

The basic result of this is that the first time we create an entity, we are given the value 0:0, where the first number is the generation number and the second number is the index. If we delete this, and then create another entity, we’ll get the value 1:0. It is still using the same index, but the generation number has been incremented. This means that it is really easy to check if a given entity is valid or not.

pub fn is_valid(self: Entity) bool {
    if (self.idx >= generations.items.len) {
        return false;
    }
    if (generations.items[self.idx] != self.gen) {
        return false;
    }
    return true;
}

First, we check if the index of the entity is not greater than the maximum number of entities we’ve ever created by checking against the number of items in the generations array. Then we check that the generation number in that generations array at the entities index is the same as this entities generation number, and only then do we return true.

I’m sure you’re already ahead of me now. What happens if we create a new entity and then delete it say, 256 times, what is the entity identifier of the next entity that is created? Well, without anything fancy, we’d probably give back an entity with the same index and a generation number that has already been used. This may not be great. This may be the whole pointer reuse problem again. We could reduce the instance of this by increasing the size of the generation number, or we could ‘fix’ it with code.

What we actually do is have yet another array called old_recycled that is just like the recycled array but when the generation number gets to 255, instead of adding the index to recycled we add it to old_recycled.

generations.items[idx] = gen +% 1;
if (generations.items[idx] == 255) {
    old_recycled.append(allocator, idx);
} else {
    recycled.append(allocator, idx);
}

This means that if you new/delete a single entity, you’ll start with entity identifies 0:0, then get 1:0, 2:0, 3:0, all the way up to 253:0 and then 254:0 and then you’ll get 0:1. The generations value at index 0 is set to 255, meaning that no entity identifier with an index 0 is now considered value, and we’ve starting creating entities with the index 1. This essentially means that we can create and delete also 2 billion entities before we have problems. However, problems will still happen.

fn recycle_old_generations() void {
    while (old_recycled.pop()) |idx| {
        const gen = generations.items[idx];
        generations.items[idx] = gen +% 1;
        recycled.append(allocator, idx);
    }
}

To resolve this, there is a function called recycle_old_generations. This iterates through all the old_recycled values, popping them off one at a time, and then incrementing the generation value one or more and adding these values back to the recycled array. This should not be called every frame, but it would probably be ok to call this at a suitable break, say between levels, or just before saving.

Hopefully, you understand what an entity is, but just creating entities is not very useful if they can’t also have data. This is where Components come in.

A Component is just a bit of plain old data. In zig, that is a struct. For example, say you want to have an entity that has a position, well, you can just do:

const Position = struct { x: f32, y: f32 };

const entity_one = ecs.new ();
entity_one.set (Position {.x = 2, .y = 3});

However, there is a small step that is required before calling the set function on the entity. We need to register the component. To register the component, we just call the register_component function. This takes two arguments, the name of the component and the type of the component. Yes, its that fun time in Zig where we pass types as parameters at compile time.

pub fn register_component(comptime name: []const u8, comptime Component: type) void {

The first thing we do is get a unique identifier for the Component. We call this a TypeId, but essentially it is just a usize.

const type_id = get_type_id(Component);

Zig doesn’t have any runtime type identifiers but we can build it ourselves. This is as simple as doing:

fn get_type_id(comptime Component: type) TypeId {
    return @intFromPtr(&(struct {
        var item: u8 = @intCast(@sizeOf(Component));
    }).item);
}

This is a rather complicated function, so let’s walk through it step by step. We’re going to return a integer (TypeId is just a usize remember), and that integer is cast from a pointer using @intFromPtr. The pointer in question is the address of an item of a struct that is defined at compile time in this function. The item itself is just a variable of that struct that is an 8-bit value that has the value of the size of the Component. We really don’t use that size at this point, and probably never will, but it is free to call this at compile time and we need to have some code that will force this struct to be instantiated so that we can takes its address and convert that to an integer to pass back as a TypeId. Yes, it is confusing the first couple of times you look at it.

Next, we want to build up a ComponentInfo structure that defines everything the ecs will use to understand components. This includes the name, the size of the component, as well as the information about the fields in this Component and their types. There is a handy function called std.meta.fields that helps here. We can iterate over all the Component fields using an inline for which is a for loop that is done at compile time. We can then match the actual type of the field with our own internal type enum and fill in the information for the field in the info structure. That information will become useful when we want to do introspection, either in the user interface or during saving and loading. We can find other information about this field using the @offsetOf and @sizeOf builtins. The final thing we need is someplace to store component data.

We store component data in an ArrayHashMap. The actual component storage is done using yet another zig comptime function that takes a type and returns another type: ComponentStorage.

pub fn ComponentStorage(Component: type) type {
    return struct {
        store: std.AutoArrayHashMapUnmanaged(EntityIndex, Component) = .empty,

        const Self = @This();

        pub const empty: Self = .{ .store = .empty };

        pub fn deinit(self: *Self) void {
            self.store.deinit(allocator);
        }
    }
}

As you can see, we are just mapping an EntityIndex into the value of a Component. We can then define any number of methods on this struct, like deinit to free up the memory required. For example, the function get is very simple.

pub fn get(self: *Self, key: Entity) ?Component {
    return self.store.get(key.idx);
}

However, there is a problem. A rather big problem. Some might even call it an elephant in the room sized problem. When we are trying to set an entity with a component, we need to be able to call the get function in the component storage or that particular component. However, we can’t key hash maps with a type, only a usize or something similar. Something like the TypeId we found above. Therefore, we have to do some jumping around to make this work.

pub fn get(self: Entity, comptime Component: type) ?Component {
    if (!self.is_valid()) {
        log.err("get invalid entity {}", .{self});
        return null;
    }

    const typeid = get_type_id(Component);

    if (components.getPtr(typeid)) |info| {
        var storage = info.storage.cast(Component);
        return storage.get(self);
    } else {
        log.err("Component {s} not registered", .{@typeName(Component)});
    }
    return null;
}

As shown above, the entity get method, first checks if the entity is valid and returns null if it is not. Then we find the typeid using the get_type_id function. This is using a comptime value, so that ‘function call’ is really just a setting of a hard coded value. We then call grab the ComponentInfo from the components array, created during the register_component function. from the component info, we grab the storgae and call the get function on the storage. Except there is a cast function in there. What is that?

    pub fn cast(
        self: ErasedComponentStorage,
        Component: type,
    ) *ComponentStorage(Component) {
        return @alignCast(@ptrCast(self.ptr));
    }

This is the function. It takes something called an ErasedComponentStorage and a component type, and returns a pointer to the ComponentStorage for that component. The function body is really just a @ptrCast. It is really simple. Except you should now be asking what the ErasedComponentStorage is?

pub const ErasedComponentStorage = struct {
    ptr: *anyopaque,
    deinit: *const fn (self: ErasedComponentStorage) void,

    pub fn cast(
        self: ErasedComponentStorage,
        Component: type,
    ) *ComponentStorage(Component) {
        return @alignCast(@ptrCast(self.ptr));
    }
};

Here is the complete structure definition. It is a zig structure that has a ptr to something, and a function pointer called deinit. Confused yet? The key to how this works is how this is initialized in the first place.

fn init_erased_component_storage(Component: type) ErasedComponentStorage {
    const ptr = allocator.create(ComponentStorage(Component)) catch unreachable;
    ptr.* = .empty;

    return ErasedComponentStorage{
        .ptr = ptr,
        .deinit = (struct {
            fn deinit(self: ErasedComponentStorage) void {
                const cast_ptr = self.cast(Component);
                cast_ptr.deinit();
                allocator.destroy(cast_ptr);
            }
        }).deinit,
    };
}

The init_erased_component_storage function takes a type and returns an ErasedComponentStorage. It does this by first creating an instance of the ComponentStorage(Component) and initialising it to .empty. We not have a pointer to the component storage for the given Component. We can then return this back to the caller in the ErasedComponentStorage along with an implementation of the deinit function.

This is a little more zig trickery. We create an anonymous structure that contains a single method that takes an erased component storage, casts it to the ComponentStorage(Component) using the cast funciton, and then calling deinit on that component storage, before destroying the instance itself. The tricky bit is that the value Component is a comptime value of the init_erased_component_storage and therefore a different embodiment of the deinit function will be created for each new type that is used in this init function. And each of those knows how to cast an opaque pointer to the exact type for that specific type.

And by the magic of comptime, we’ve got a way to go from a typeid to the storage that is used to hold values of the Component in a fully type-safe manner.

For the entity set function, for example, it is as simple as:

pub fn set(self: Entity, value: anytype) void {
    const Component = @TypeOf(value);

    if (!self.is_valid()) {
        log.err("set invalid entity {}", .{self});
        return;
    }

    const typeid = get_type_id(Component);

    if (components.getPtr(typeid)) |info| {
        var storage = info.storage.cast(Component);
        storage.set(self, value);
    }
}

Note, that we don’t explicitly pass in the type of the value, but we can determine this at compile time using the @TypeOf builtin.

The final part of ECS is the system. How does that work?

We first have to register a system using the register_system function. This takes in some options, like the name of the system, a function that will be called when the system is run, and a set of Components that an entity must have to be included in this system. Obviously, the components that are passed in are first converted into TypeIds using the get_type_id function and all the information about the system is stored in a SystemInfo struct that is stored in a systems array.

One interesting bit about systems is that they have a phase. This is essentially an ordering of similar systems so that systems that need to run before others will have an earlier phase. This also means that systems can be sorted on their phase to determine the order of execution. Although this is not implemented now, all systems within a single phase could be executed in parallel. At the moment, we are strictly single threaded, purely because we don’t need to over optimize at the moment.

System functions take in an SystemIterator that contains two useful bits of information, the delta_time from the last time this system was called, and a slice of entities that this system should operate on. The system functions themselves can be very simple:

fn movement_system(iter: *const SystemIterator) void {
    for (iter.entities) |entity| {
        if (entity.getPtr(Position)) |pos| {
            if (entity.get(Velocity)) |vel| {
                pos.x += vel.dx * iter.delta_time;
                pos.y += vel.dy * iter.delta_time;
            }
        }
    }
}

The above code just iterates over the entities slice, and for each entity, get’s a mutable Position as pos, a constant Velocity as vel and then updates the position based on the velocity and the delta time.

The set of entities that a system operates on is held in the SystemInfo within an ArrayHashMap(Entity,void). This hash map has a type for the key but nothing for the value. This essentially means that this is operating as a set, but one that can be accessed using an array. All very simple and efficient. But this set of entities has to be maintained. Every time an entity has a new Component set to it, the set of systems that need to know about this entity might need to be updated.

To accommodate this, there is an entity_changes set, another ArrayHasMap(Entity,void) is used. Each time the set of Components changes on an entity, the entity identifier is added to this set. Once all the systems are run, each of the entities within this set is checked against the list of registered systems, and the component typeids that they care about, and the entity is either added or removed from each of these system entities sets. This is done only after all the systems are run to reduce the complexities when iterating thought all the systems and all the entities within those systems.

That is a very quick overview of the ECS system. There are other things in the module that might be interesting. For example, there is a concept known as staged that used when systems are run where all writes to the database, which is not through a mutable update of data like in the movement_system shown above, are done by writing the data to be written into another array and performing all these writes after all the systems are executed. This essentially means that system execution can consider the database to be read only, except where the system arguments are explicitly marked as mutable. This also means that we can check this mutablility at runtime in the entity getPtr function.

Fourth, Game