Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisations and convert to use crunchy #22

Merged
merged 2 commits into from
Feb 26, 2018
Merged

Conversation

eira-fransham
Copy link
Contributor

@eira-fransham eira-fransham commented Feb 26, 2018

Basically I just converted a bunch of mutable variables to be constants and made the array zeroing simpler/faster, mostly by using the macros in crunchy. Unrolling for i in 0..24 outer loop leads to a noticeable increase in compilation time but also massively increases the speed (probably because it allows more optimisations to be done on the array[i][...] accesses). I was looking at this repo because I wanted to convert it to use simd but it turns out there was some low-hanging optimisation fruit that doesn't require nightly.

Benchcmp results (I ran the benches 3 times each before and after this PR so there are 3 copies of each benching function):

 name                             pre.bench ns/iter  post.bench ns/iter  diff ns/iter   diff %  speedup 
 bench_sha3_256_input_4096_bytes  26,662 (153 MB/s)  20,100 (203 MB/s)         -6,562  -24.61%   x 1.33 
 bench_sha3_256_input_4096_bytes  26,203 (156 MB/s)  19,995 (204 MB/s)         -6,208  -23.69%   x 1.31 
 bench_sha3_256_input_4096_bytes  26,322 (155 MB/s)  20,157 (203 MB/s)         -6,165  -23.42%   x 1.31 
 keccakf_u64                      663 (37 MB/s)      496 (50 MB/s)               -167  -25.19%   x 1.34 
 keccakf_u64                      650 (38 MB/s)      491 (50 MB/s)               -159  -24.46%   x 1.32 
 keccakf_u64                      699 (35 MB/s)      486 (51 MB/s)               -213  -30.47%   x 1.44 

One unresolved question is whether the loop at line 75 should be replaced with something like:

for x in 0..5 {
    let mut out = 0;

    unroll! {
        for y_count in 0..5 {
            let y = y_count * 5;
            out ^= a[x + y];
        }
    }

    arrays[i][x] = out;
}

with mem::uninitialized for the initialisation of arrays. This should be more consistently optimised since it says what we actually want to happen, but on my computer it's slower, about 5%. If someone could test this against the version that's committed here on a different computer that'd be really useful.

@debris
Copy link
Owner

debris commented Feb 26, 2018

nice! I can confirm that on my machine it's also about 20% faster! 🎉

@debris debris merged commit 7a0a70e into debris:master Feb 26, 2018
@newpavlov
Copy link
Contributor

@Vurich
Do you mind if I'll copy your code to the sha3 crate?

@eira-fransham
Copy link
Contributor Author

@newpavlov It's CC0, you can copy whatever you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants