Optimisations and convert to use crunchy #22

eira-fransham · 2018-02-26T11:58:37Z

Basically I just converted a bunch of mutable variables to be constants and made the array zeroing simpler/faster, mostly by using the macros in crunchy. Unrolling for i in 0..24 outer loop leads to a noticeable increase in compilation time but also massively increases the speed (probably because it allows more optimisations to be done on the array[i][...] accesses). I was looking at this repo because I wanted to convert it to use simd but it turns out there was some low-hanging optimisation fruit that doesn't require nightly.

Benchcmp results (I ran the benches 3 times each before and after this PR so there are 3 copies of each benching function):

 name                             pre.bench ns/iter  post.bench ns/iter  diff ns/iter   diff %  speedup 
 bench_sha3_256_input_4096_bytes  26,662 (153 MB/s)  20,100 (203 MB/s)         -6,562  -24.61%   x 1.33 
 bench_sha3_256_input_4096_bytes  26,203 (156 MB/s)  19,995 (204 MB/s)         -6,208  -23.69%   x 1.31 
 bench_sha3_256_input_4096_bytes  26,322 (155 MB/s)  20,157 (203 MB/s)         -6,165  -23.42%   x 1.31 
 keccakf_u64                      663 (37 MB/s)      496 (50 MB/s)               -167  -25.19%   x 1.34 
 keccakf_u64                      650 (38 MB/s)      491 (50 MB/s)               -159  -24.46%   x 1.32 
 keccakf_u64                      699 (35 MB/s)      486 (51 MB/s)               -213  -30.47%   x 1.44

One unresolved question is whether the loop at line 75 should be replaced with something like:

for x in 0..5 {
    let mut out = 0;

    unroll! {
        for y_count in 0..5 {
            let y = y_count * 5;
            out ^= a[x + y];
        }
    }

    arrays[i][x] = out;
}

with mem::uninitialized for the initialisation of arrays. This should be more consistently optimised since it says what we actually want to happen, but on my computer it's slower, about 5%. If someone could test this against the version that's committed here on a different computer that'd be really useful.

debris · 2018-02-26T17:21:04Z

nice! I can confirm that on my machine it's also about 20% faster! 🎉

newpavlov · 2018-02-27T19:41:16Z

@Vurich
Do you mind if I'll copy your code to the sha3 crate?

eira-fransham · 2018-03-05T09:39:46Z

@newpavlov It's CC0, you can copy whatever you like.

eira-fransham added 2 commits February 26, 2018 12:54

Optimisations and convert to use crunchy

7792481

Add comment explaining why I kept in the seemingly-useless zeroing

662d2e2

debris merged commit 7a0a70e into debris:master Feb 26, 2018

debris mentioned this pull request Feb 28, 2018

bump tiny-keccak openethereum/parity-ethereum#8019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisations and convert to use crunchy #22

Optimisations and convert to use crunchy #22

eira-fransham commented Feb 26, 2018 •

edited

Loading

debris commented Feb 26, 2018

newpavlov commented Feb 27, 2018

eira-fransham commented Mar 5, 2018

Optimisations and convert to use crunchy #22

Optimisations and convert to use crunchy #22

Conversation

eira-fransham commented Feb 26, 2018 • edited Loading

debris commented Feb 26, 2018

newpavlov commented Feb 27, 2018

eira-fransham commented Mar 5, 2018

eira-fransham commented Feb 26, 2018 •

edited

Loading