Zig

Jan 7, 2025

9094e97 · Jan 7, 2025

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	Add files via upload	Jan 7, 2025
build.bat	build.bat	Add files via upload	Mar 11, 2024
debug_example.zig	debug_example.zig	Add files via upload	Mar 17, 2024
example.zig	example.zig	Add files via upload	Mar 17, 2024
grid.zig	grid.zig	Update grid.zig	Jan 7, 2025
sudoku.exe	sudoku.exe	Add files via upload	Nov 21, 2024
sudoku.zig	sudoku.zig	Add files via upload	Mar 11, 2024

README.md

Sudoku solver - Zig Implementation

Solving Sudoku Grids on Windows 64

See documentation on how to use this program here.

Optimizations done in this version

I have done several different optimizations in Zig. Some are either not possible, either not portable in C. Many of them may be available on gcc but not in other compilers. On the other hand, every optimization made here in the Zig version is portable, although some may not generate the most optimal performance in platforms different than X64.

Using a linear grid

In Zig the grid matrix is given linearly by an array containing the 81 elements of the grid stored line by line contiguously:

    var grid = [_]u8{0} ** 81;        // Sudoku grid stored linearly here

This configuration increases cache coherency and avoids indirections to access the elements via pointers as it's usually done in matrices and as it was also done in previous versions of this Zig code. The linear storage doesn't come for free, since it implies additional operations in the solve function to be able to cope with this configuration.

The most notable is to maintain not only the line and column of an element (j and i variables), but also its index (index variable) in the linear grid.

Additional operations are needed to recover index whenever backtracking, by calculating it with the previous line and column values popped from the stack. Here one needs to multiply i (the current line) by 9, to jump over the previous lines, and add j, the current column:

    index = @shlExact(i,3) + i + j;

Since backtracking occurs less often than other parts of the loop, these extra operations don't impact the performance in a noticeable way.

The most frequent operation impacting the linear grid configuration is an extra addition to increment index, besides the usual j incrementation at the end of the loop just before testing a line change and end of the loop:

      index += 1;                         // advance to the next position in grid
      j += 1;                             // advance to the next column

Fortunately, the time spent in the extra operations didn't overlap the time gained with the linear grid storage. Less indirections and more coherency when accessing the elements one after the other in sequence as done here highly justified the cost of extra operations. It's clear that the less one needs to use values stored in memory the better the solver performs. Focusing on that unveiled quite a few surprises after calculating values dynamically instead of accessing the calculated values in memory.

Calculating the grid element value from bit representation using @popcount

Each grid element value (0 to 9) is represented in binary as shown in the table below to speed up occupation sets checking.

Element Value	Binary Representation	Hexadecimal	Decimal
0	000000000	0x000	0
1	000000001	0x001	1
2	000000010	0x002	2
3	000000100	0x004	4
4	000001000	0x008	8
5	000010000	0x010	16
6	000100000	0x020	32
7	001000000	0x040	64
8	010000000	0x080	128
9	100000000	0x100	256

In practice, one never uses zero because in Sudoku zero represents an empty element, an element not yet filled with an estimated value by the solver. All estimated values are then between 1 and 9.

It's easy to convert a value n, where:

    n ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9}

If code is the binary representation of n, one can calculate code in this way:

    code = 1 << (n-1)

But it's not simple to obtain n from code, unless using popcount assembly instruction.

Since popcount instruction counts the numbers of ones in an integer binary value, one can calculate n in this way in Zig:

    n = @popCount(code - 1) + 1

Substituting this code in the Zig version of the Sudoku solver produced a noticeable optimization. The popCount built-in actually generates a single Assembler instruction as shown here:

Actually calculating a division by 3 instead of using tables.

This was one of the most surprising optimizations of them all. In Sudoku one needs to calculate in which 3x3 subgrid (that I called a "cell," but in Sudoku, cells generally refer to any of its 81 grid elements) an element belongs to check if an estimated value for this element is already used somewhere in its subgrid.

This is normally done by first calculating the following two integer truncating divisions:

    @divTrunc(i, 3)
    @divTrunc(j, 3)

Initially, I was doing this using a table, since I estimated that divisions would be too slow.

But I decided to try doing the division explicitly as shown above, and I was quite surprised to see that a significant speed up was obtained. That puzzled me and it triggered me to investigate what was going on under the hood.

What I found was that the Assembler code produced was actually only doing an integer multiplication followed by a shift operators as shown below.

I kind of understood that it was multiplying the value by a fixed point notation for ⅓, but to me that could never result into an exact integer number corresponding to the quotient. Well, it turns out it can.

The math behind is called Modular Arithmetic. I didn't dive in depth, but the demonstration in this site is pretty clear, although I just browsed through. It's indeed basically a fixed point notation in binary (0xAAAB in the code corresponds to ~0.3333 but shifted left in binary), but the arithmetic, however, is not approximate as one would normally assume. It's demonstrable exact.

Use of prefetch

Prefetch is a very interesting resource for increasing memory cache coherency. One can't use in many places in the same context. In this code I used it before entering the loop and at the end of the loop to keep grid[index] in the cache memory. I just tewweked some values and it actually produced faster executions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

Zig

Zig

README.md

Sudoku solver - Zig Implementation

Solving Sudoku Grids on Windows 64

Optimizations done in this version

Using a linear grid

Calculating the grid element value from bit representation using @popcount

Actually calculating a division by 3 instead of using tables.

Use of prefetch

Files

Zig

Directory actions

More options

Directory actions

More options

Latest commit

History

Zig

Folders and files

parent directory

README.md

Sudoku solver - Zig Implementation

Solving Sudoku Grids on Windows 64

Optimizations done in this version

Using a linear grid

Calculating the grid element value from bit representation using @popcount

Actually calculating a division by 3 instead of using tables.

Use of prefetch