From 1b22462b55289089bba6cab1016ce3e1da460ff5 Mon Sep 17 00:00:00 2001
From: dysleixa <emond.jesse@gmail.com>
Date: Tue, 22 Nov 2022 23:37:50 -0500
Subject: [PATCH] [write-up] Pathfinding optimizations breakdown

---
 README.md | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/README.md b/README.md
index 77898b9..3374cc3 100644
--- a/README.md
+++ b/README.md
@@ -316,7 +316,7 @@ graph afterwards, we need to speed things up:
     list and reprioritize the priority queue with updated
     heuristic values;
 - I did a couple more optimizations, outlined in the
-  `Speed Optimization Ablation` section.
+  `Speed Optimization Breakdown` section.
 
 But we end up with a graph representation of our problem:
 
@@ -475,7 +475,7 @@ hyperparameters that control behavior:
   option instead of sampling;
 - **not** _alpha_: power for the pheromone -- I removed this from
   computations to speed up processing (see the `Speed Optimization
-  Ablation` section), which is not strictly equivalent to having it
+  Breakdown` section), which is not strictly equivalent to having it
   if we resweep _beta_, but this gave me decent results while being
   faster (allowing more iterations/ants).
 - _beta_: power for the heuristic when computing sampling weights;
@@ -637,7 +637,7 @@ to go _fast_.
 I started writing benchmarks and
 [profiling with perf](https://nnethercote.github.io/perf-book/profiling.html)
 to iterate on optimizations. The following were the most impactful,
-but see the `Speed Optimizations Ablation` section for details:
+but see the `Speed Optimizations Breakdown` section for details:
 - Represent sets of elements as a `u32` mask, with clever
   [bit hacks](https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation)
   to go through all permutations with a fixed amount of bits set
@@ -848,7 +848,7 @@ solving TSP problems, thanks Coveo for the great
 challenge, as always! (Psst, I also hear that
 [they're hiring](https://www.coveo.com/en/company/careers).)
 
-## Speed Optimizations Ablation
+## Speed Optimizations Breakdown
 
 This section breaks down various optimizations that I made
 throughout the challenge, and the impact that they had.
@@ -870,8 +870,8 @@ The listed optimizations are not in the same order that I
 added them during the challenge, but to be able to get an
 understanding of how each contributed to the final speed,
 I re-implemented "simple" versions of all operations, ~and
-gradually re-introduced the same optimizations in a branch
-called `optimization-ablation` after-the-fact, if you'd
+gradually re-introduced the same optimizations in branches
+called `optimization-ablation-...` after-the-fact, if you'd
 like to look at the incremental changes.~ TODO(emond): this
 is not done yet, I just have the before/final numbers for
 now.
@@ -880,32 +880,40 @@ now.
 
 Comparing incremental improvements on `simple_graph.rs`
 and `simple_pathfinding.rs` vs. `graph.rs` and
-`pathfinding.rs`.
+`pathfinding.rs`, in branch
+[`optimization-ablation-pathfinding`](https://github.com/JesseEmond/blitz-2023-inscription/tree/optimization-ablation-pathfinding).
 
 Going from the non-optimized "simple" versions to the
 final ones on the graph creation benchmark gives a
 **97%** relative improvement in compute time
-(1.74s -> 58ms), i.e. it is **~30x faster**.
+(1.78s -> 58ms), i.e. it is **~31x faster**.
 
 The optimizations are:
 - [Use `FxHashMap`](https://nnethercote.github.io/perf-book/hashing.html)
   for `came_from`/`cost_so_far` storage;
+- Hardcode tide schedule len/widths/heights as constants, switch to
+  static arrays for topology and tide schedules;
 - Precompute tile neighbors for all tick offsets;
-- Hardcode tick offsets/widths/heights as constants;
-- Store pre-computed neighbors in contiguous memory where
-  possible (`ArrayVec`, arrays);
+- Use a heuristic based on min
+  [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance)
+  of all targets, reprioritize queue when a goal is found;
 - Represent `(position, wait)` state as a packed `u32`;
-- Multithread on 3 cores the pathfinding from different
-  starting positions;
 - Do [early exploration](https://takinginitiative.wordpress.com/2011/05/02/optimizing-the-a-algorithm/),
   where neighbor nodes can skip the priority queue if their
   f-score is <= the current one;
-- Use a heuristic based on min
-  [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance)
-  of all targets, reprioritize queue when a goal is found.
-
-TODO(emond): Include table of relative optimization
-ablation, adding one at a time incrementally
+- Multithread on 3 cores the pathfinding from different
+  starting positions.
+
+| Optimization | Commit | Benchmark Time | Speedup (relative improvement) |
+| --- | --- | --- | --- |
+| Base (simple implementation) | [8dea4d6](https://github.com/JesseEmond/blitz-2023-inscription/commit/8dea4d6bee03d593b31b2556e7f55af1146b259c) | 1786ms | _N/A_ |
+| + `FxHashMap` | [cb22692](https://github.com/JesseEmond/blitz-2023-inscription/commit/cb226923cd689f26365d30853925a67cc197b65b) | 1359ms | 23.9% |
+| + Use constants & static storage | [d8fbaa5](https://github.com/JesseEmond/blitz-2023-inscription/commit/d8fbaa5d6195dd8b702de707495756be9d53ef3d) | 1240ms | 8.8% |
+| + Precompute neighbors | [afef254](https://github.com/JesseEmond/blitz-2023-inscription/commit/afef25405bb55c63caf90d07bdf265f9c0de6d5f) | 1185ms | 4.4% |
+| + Use heuristic with reprioritization | [8294dbe](https://github.com/JesseEmond/blitz-2023-inscription/commit/8294dbef4cf00bc1f8aeac923bd59ad9679de148) | 198ms | 83.3% |
+| + Pack A-star state as u32 | [56fbd48](https://github.com/JesseEmond/blitz-2023-inscription/commit/56fbd486b0bb6b359c4228ac470a362f5da394c3) | 193ms | 2.4% |
+| + Early exploration | [bf14332](https://github.com/JesseEmond/blitz-2023-inscription/commit/bf14332cd0819b20ad959dfcc7ea23612ddfff6a) | 154ms | 20.4% |
+| + Mutlithreading (3 cores) | [5f275ba](https://github.com/JesseEmond/blitz-2023-inscription/commit/5f275ba13540163a61cb13d2145756f591910510) | 58ms | 62.2% |
 
 ### Ant Colony Optimizations
 Comparing incremental improvements on `simple_ant_colony_optimization.rs`