Skip to content

Files

Latest commit

9094e97 · Jan 7, 2025

History

History

Zig

Sudoku solver - Zig Implementation

Solving Sudoku Grids on Windows 64

See documentation on how to use this program here.

Optimizations done in this version

I have done several different optimizations in Zig. Some are either not possible, either not portable in C. Many of them may be available on gcc but not in other compilers. On the other hand, every optimization made here in the Zig version is portable, although some may not generate the most optimal performance in platforms different than X64.

Using a linear grid

In Zig the grid matrix is given linearly by an array containing the 81 elements of the grid stored line by line contiguously:

    var grid = [_]u8{0} ** 81;        // Sudoku grid stored linearly here

This configuration increases cache coherency and avoids indirections to access the elements via pointers as it's usually done in matrices and as it was also done in previous versions of this Zig code. The linear storage doesn't come for free, since it implies additional operations in the solve function to be able to cope with this configuration.

The most notable is to maintain not only the line and column of an element (j and i variables), but also its index (index variable) in the linear grid.

Additional operations are needed to recover index whenever backtracking, by calculating it with the previous line and column values popped from the stack. Here one needs to multiply i (the current line) by 9, to jump over the previous lines, and add j, the current column:

    index = @shlExact(i,3) + i + j;

Since backtracking occurs less often than other parts of the loop, these extra operations don't impact the performance in a noticeable way.

The most frequent operation impacting the linear grid configuration is an extra addition to increment index, besides the usual j incrementation at the end of the loop just before testing a line change and end of the loop:

      index += 1;                         // advance to the next position in grid
      j += 1;                             // advance to the next column

Fortunately, the time spent in the extra operations didn't overlap the time gained with the linear grid storage. Less indirections and more coherency when accessing the elements one after the other in sequence as done here highly justified the cost of extra operations. It's clear that the less one needs to use values stored in memory the better the solver performs. Focusing on that unveiled quite a few surprises after calculating values dynamically instead of accessing the calculated values in memory.

Calculating the grid element value from bit representation using @popcount

Each grid element value (0 to 9) is represented in binary as shown in the table below to speed up occupation sets checking.

Element Value Binary Representation Hexadecimal Decimal
0 000000000 0x000 0
1 000000001 0x001 1
2 000000010 0x002 2
3 000000100 0x004 4
4 000001000 0x008 8
5 000010000 0x010 16
6 000100000 0x020 32
7 001000000 0x040 64
8 010000000 0x080 128
9 100000000 0x100 256

In practice, one never uses zero because in Sudoku zero represents an empty element, an element not yet filled with an estimated value by the solver. All estimated values are then between 1 and 9.

It's easy to convert a value n, where:

    n ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9} 

If code is the binary representation of n, one can calculate code in this way:

    code = 1 << (n-1)

But it's not simple to obtain n from code, unless using popcount assembly instruction.

Since popcount instruction counts the numbers of ones in an integer binary value, one can calculate n in this way in Zig:

    n = @​popCount(code - 1) + 1

Substituting this code in the Zig version of the Sudoku solver produced a noticeable optimization. The ​popCount built-in actually generates a single Assembler instruction as shown here:

Actually calculating a division by 3 instead of using tables.

This was one of the most surprising optimizations of them all. In Sudoku one needs to calculate in which 3x3 subgrid (that I called a "cell," but in Sudoku, cells generally refer to any of its 81 grid elements) an element belongs to check if an estimated value for this element is already used somewhere in its subgrid.

This is normally done by first calculating the following two integer truncating divisions:

    @divTrunc(i, 3)
    @divTrunc(j, 3)

Initially, I was doing this using a table, since I estimated that divisions would be too slow.

But I decided to try doing the division explicitly as shown above, and I was quite surprised to see that a significant speed up was obtained. That puzzled me and it triggered me to investigate what was going on under the hood.

What I found was that the Assembler code produced was actually only doing an integer multiplication followed by a shift operators as shown below.

I kind of understood that it was multiplying the value by a fixed point notation for ⅓, but to me that could never result into an exact integer number corresponding to the quotient. Well, it turns out it can.

The math behind is called Modular Arithmetic. I didn't dive in depth, but the demonstration in this site is pretty clear, although I just browsed through. It's indeed basically a fixed point notation in binary (0xAAAB in the code corresponds to ~0.3333 but shifted left in binary), but the arithmetic, however, is not approximate as one would normally assume. It's demonstrable exact.

Use of prefetch

Prefetch is a very interesting resource for increasing memory cache coherency. One can't use in many places in the same context. In this code I used it before entering the loop and at the end of the loop to keep grid[index] in the cache memory. I just tewweked some values and it actually produced faster executions.