Calculating primes

(1381 words)

Creating a list of primes takes a long time. This is especially true if you are using the very boring and simple sieve algorithm. In Zig, this is really easy to do:

pub fn calculate_prime_numbers (comptime size: usize) std.StaticBitSet(size) {
    var sieve = std.StaticBitSet(size).initFull();
    sieve.setValue (0, false);
    sieve.setValue (1, false);

    const last: usize = @intFromFloat(@sqrt(@as(f64, @floatFromInt(size))));

    for (2..size) |n| {
        if (sieve.isSet(n)) {
            results.set(n);
        }
        var i: usize = n * 2;
        while (i <= last) : (i += n) {
            sieve.setValue(i, false);
        }
    }
    return results;
}

The code is really simple. It creates a sieve set, initially full, except for the number 0 and 1. It then looks through from the number 2 up to the square root of the size parameter given, and then looks at the sieve set to see if the bit is set. If it is, then it goes from double the current number up to the size parameter jumping by the current number, and clears the sieve bits, effectively removing all the multiples of the current number as not prime for future iterations.

You may notice the comptime keyword in front of the size parameter in the function. This means that each time we call this function, we are making sure that the size parameter is known at compile time and therefore we can size the StaticBitSet to that size. This means that there is no memory allocations required, and therefore should be relatively fast.

How fast is this? Well, let’s start with setting size to 500,000. For example, the last prime number calculated is 499979. On my machine1, this takes about 1.8 milliseconds. To display them out to stderr using the std.debug.print function takes a lot longer, (~750 ms).

Comptime please

But we can do better, can’t we? Can we layer comptime on top of comptime?

Well, lets look at how we call this function.

pub fn main () void {
    const max_size = 500_000;

    const prime_numbers = calculate_prime_numbers(max_size);

    for (0..max_size) |n| {
        if (prime_numbers.isSet(n)) {
            std.debug.print("{} ", .{n});
        }
    }
}

What happens if we stick a little comptime in front of the call to calculate prime_numbers?

    const prime_numbers = comptime calculate_prime_numbers(max_size);

Oh dear, we get an error…

src/main.zig:40:13: error: evaluation exceeded 1000 backwards branches
            while (i < size) : (i += n) {
            ^~~~~
src/main.zig:40:13: note: use @setEvalBranchQuota() to raise the branch limit from 1000
src/main.zig:10:59: note: called from here
    const prime_numbers = comptime calculate_prime_numbers(max_size);
                                   ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

Basically, comptime is limited in terms of how many times you can branch, which is a good approximation of the complexity of code, and we exceeded this limit. But, we can use @setEvalBranchQuota to increase this.

By adding this to the top of calculate_prime_numbers, we can make the Zig compiler do a lot of work. By settings this to a very large number, say a few billion, we can make the compiler calculate the prime numbers at compile time. This number is limited to a 32-bit unsigned integer, so we’ll just set it to the largest such number.

    @setEvalBranchQuota(std.math.maxInt(u32));

Of course, this does cause a little issue. Mainly, the time taken to compile is a little longer. Ok, a lot longer. I’m compiling right now, and the Zig compiler is pegged at 100% on a single core in the Code Generation phase.

I’m still waiting for the compiler to finish code generation. While I’m waiting, this the shell command I’m using.

$ time zig build -Doptimize=ReleaseFast run --summary all

Yes, I’m doing an optimized build using the ReleaseFast mode.

(One minutes later) I’m still waiting… And we can’t even blame LLVM for this slowness. This is all the Zig comptime virtual machine that is effectively an interpreter, as shown by the Code Generation stage being shown in the terminal window. It is just as slow as Python or Lua. It is also using a lot of memory. The process is currently using 60GB of system memory.

Ok, that was about three minutes to compile. Not all of that was taken up generating the code, but given that doing the same but with the maximum prime being 100, it only took 6.5 second to compile, we can probably just say the code generation part took 2 minutes and 54 seconds-ish.

This means that doing this operation at comptime was about 100,000 times slower.

Doing compile time execution of functions (not comptime)

Could there be a better way? Yes, let’s do this at comptime :-)

We’ll move the calculation of the prime numbers into build.zig, do the actual calculation at runtime in the execution of the build.zig file, and then create a source file that contains the answer we want. This sounds complicated, but is it? We don’t even need to move the sieve function into build.zig, as we can just import the main.zig file and use the existing function directly.

It is a little more difficult than that. Firstly, we create a PrimeStep struct that is used to create a build step that allows the executable to use this as a dependency.

const PrimeStep = struct {
    step: std.Build.Step,
    outfile: std.Build.GeneratedFile,

    pub fn init (b: *std.Build) *PrimeStep {
        const self = b.allocator.create(PrimeStep) catch unreachable;
        self.step = std.Build.Step.init (.{
                .id = .custom,
                .name = "calculate primes",
                .owner = b,
                .makeFn = make,
        });
        self.outfile = std.Build.GeneratedFile {.step = &self.step};
        return self;
    }

We obviously need to create an instance of this in the build function.

    const prime_step = PrimeStep.init (b);

And we need to allow this to be imported into the main.zig file so that we can use access the prime numbers.

    exe.root_module.addAnonymousImport("primes", .{
        .root_source_file = prime_step.get_output_file (),
});

This leaves us with two more functions to write. The get_output_file for the generated file that is going to be imported.

    pub fn get_output_file(self: *PrimeStep) std.Build.LazyPath {
        return .{
            .generated = .{ .file = &self.outfile },
        };
    }

And the make function for our build step.

    fn make(build_step: *std.Build.Step, options: std.Build.Step.MakeOptions) !void {
        _ = options;

We need our step, not the std.Build.Step, so we grab that using the @fieldParentPtr builtin function.

        const step = @as(*PrimeStep, @fieldParentPtr("step", build_step));

Now we just grab the calculate_prime_numbers function from main.zig.

        const main_zig = @import("src/main.zig");
        const calculate_prime_numbers = main_zig.calculate_prime_numbers;

If we really wanted, we could move this function in the build file and never generate prime numbers in the runtime executable.

        const max_prime = 10_000_000;

        const primes = calculate_prime_numbers(max_prime);

Next we generate the prime numbers. This wastes about 100 ms during the build.

The rest is just about creating the cached file primes.zig. We do that by creating a file in the cache_root (typically .zig_cache in the build root) and then creating that file.

        const b = step.step.owner;
        const cache_root = b.cache_root;
        const path = cache_root.join(b.allocator, &.{ "o", "primes.zig" }) catch unreachable;

        step.outfile.path = path;

        const file = std.fs.createFileAbsolute(path, .{}) catch unreachable;
        defer file.close();

Finally, we generate the source file that will be imported using a writer on that file.

        var writer = file.writer();

        writer.writeAll(
            \\const std = @import("std");
            \\
            \\pub const prime_numbers : std.bit_set.ArrayBitSet(usize,
        ) catch unreachable;
        writer.print("{}", .{max_prime}) catch unreachable;
        writer.writeAll(
            \\) = .{
            \\    .masks = .{
        ) catch unreachable;

        for (0.., primes.masks) |i, bits| {
            if (i % 4 == 0) writer.writeAll("\n       ") catch unreachable;
            writer.print(" {},", .{bits}) catch unreachable;
        }

        writer.writeAll(
            \\
            \\ } };
        ) catch unreachable;

All done

We can now generate at compile time a resonably complex data structure with the list of prime numbers and make those available to the main program at runtime, without using huge amounts of comptime interpreter overhead.

The results?

Well, to generate the up to ten million prime numbers at build time took about were calculated at build time in about 350 ms. So, yes, the build was a little slower. And then they could be used directly in the main executable at essentially zero cost.

Conclusion

comptime is wonderful. But comptime takes a lot of time. If you are going to be generating a lot of data at compile time then the fastest approach would be to do this in the build.zig file. The build.zig file is compiled and then executed as part of the compilation, meaning that code that is used from the build.zig file is compiled and then executed at compile time.


  1. I have an AMD Ryzen 9 3950X @ 3.5 GHz. ↩︎

Zig, Prime Numbers