Optimisations and convert to use crunchy #22
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basically I just converted a bunch of mutable variables to be constants and made the array zeroing simpler/faster, mostly by using the macros in
crunchy
. Unrollingfor i in 0..24
outer loop leads to a noticeable increase in compilation time but also massively increases the speed (probably because it allows more optimisations to be done on thearray[i][...]
accesses). I was looking at this repo because I wanted to convert it to use simd but it turns out there was some low-hanging optimisation fruit that doesn't require nightly.Benchcmp results (I ran the benches 3 times each before and after this PR so there are 3 copies of each benching function):
One unresolved question is whether the loop at line 75 should be replaced with something like:
with
mem::uninitialized
for the initialisation ofarrays
. This should be more consistently optimised since it says what we actually want to happen, but on my computer it's slower, about 5%. If someone could test this against the version that's committed here on a different computer that'd be really useful.