Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SHA2 in Rust. #199

Closed
wants to merge 1 commit into from
Closed

Conversation

samscott89
Copy link
Contributor

Code is simple adaptation of the pre-existing SHA1 code.
Algorithm implemented as specified in [FIPS 180-4].

This addresses #61 and #62.

@samscott89
Copy link
Contributor Author

I had a go at writing some simple code for the SHA2 family. I know the original issues specified fast implementations, so these might not cut it. Roughly 2x slower than previous version and bit slower than rust-crypto.

Timings on x86_64:
ring before:

test digest::sha256::_1000               ... bench:       2,245 ns/iter (+/- 43) = 445 MB/s  
test digest::sha256::_16                 ... bench:         187 ns/iter (+/- 6) = 85 MB/s  
test digest::sha256::_2000               ... bench:       4,433 ns/iter (+/- 83) = 451 MB/s  
test digest::sha256::_256                ... bench:         739 ns/iter (+/- 12) = 346 MB/s  
test digest::sha256::_8192               ... bench:      17,679 ns/iter (+/- 11,560) = 463 MB/s  
test digest::sha256::block_len           ... bench:         346 ns/iter (+/- 23) = 184 MB/s  
test digest::sha384::_1000               ... bench:       1,616 ns/iter (+/- 26) = 618 MB/s  
test digest::sha384::_16                 ... bench:         269 ns/iter (+/- 5) = 59 MB/s  
test digest::sha384::_2000               ... bench:       3,123 ns/iter (+/- 58) = 640 MB/s  
test digest::sha384::_256                ... bench:         661 ns/iter (+/- 12) = 387 MB/s  
test digest::sha384::_8192               ... bench:      12,203 ns/iter (+/- 8,610) = 671 MB/s  
test digest::sha384::block_len           ... bench:         498 ns/iter (+/- 7) = 257 MB/s  
test digest::sha512::_1000               ... bench:       1,614 ns/iter (+/- 188) = 619 MB/s  
test digest::sha512::_16                 ... bench:         266 ns/iter (+/- 13) = 60 MB/s  
test digest::sha512::_2000               ... bench:       3,109 ns/iter (+/- 35) = 643 MB/s  
test digest::sha512::_256                ... bench:         658 ns/iter (+/- 16) = 389 MB/s  
test digest::sha512::_8192               ... bench:      12,214 ns/iter (+/- 201) = 670 MB/s  
test digest::sha512::block_len           ... bench:         498 ns/iter (+/- 12) = 257 MB/s  

Ring after:

test digest::sha256::_1000               ... bench:       5,221 ns/iter (+/- 184) = 191 MB/s  
test digest::sha256::_16                 ... bench:         359 ns/iter (+/- 16) = 44 MB/s  
test digest::sha256::_2000               ... bench:      10,409 ns/iter (+/- 440) = 192 MB/s  
test digest::sha256::_256                ... bench:       1,668 ns/iter (+/- 36) = 153 MB/s  
test digest::sha256::_8192               ... bench:      41,795 ns/iter (+/- 1,400) = 196 MB/s  
test digest::sha256::block_len           ... bench:         691 ns/iter (+/- 22) = 92 MB/s  
test digest::sha384::_1000               ... bench:       3,365 ns/iter (+/- 96) = 297 MB/s  
test digest::sha384::_16                 ... bench:         455 ns/iter (+/- 27) = 35 MB/s  
test digest::sha384::_2000               ... bench:       7,014 ns/iter (+/- 675) = 285 MB/s  
test digest::sha384::_256                ... bench:       1,357 ns/iter (+/- 105) = 188 MB/s  
test digest::sha384::_8192               ... bench:      27,777 ns/iter (+/- 1,394) = 294 MB/s  
test digest::sha384::block_len           ... bench:         883 ns/iter (+/- 29) = 144 MB/s  
test digest::sha512::_1000               ... bench:       3,403 ns/iter (+/- 205) = 293 MB/s  
test digest::sha512::_16                 ... bench:         455 ns/iter (+/- 50) = 35 MB/s  
test digest::sha512::_2000               ... bench:       6,771 ns/iter (+/- 937) = 295 MB/s  
test digest::sha512::_256                ... bench:       1,305 ns/iter (+/- 1,058) = 196 MB/s  
test digest::sha512::_8192               ... bench:      27,661 ns/iter (+/- 1,363) = 296 MB/s  
test digest::sha512::block_len           ... bench:         880 ns/iter (+/- 45) = 145 MB/s  

rust-crypto:

test digest::sha256::_1000               ... bench:       4,270 ns/iter (+/- 660) = 234 MB/s  
test digest::sha256::_16                 ... bench:         294 ns/iter (+/- 10) = 54 MB/s  
test digest::sha256::_2000               ... bench:       8,632 ns/iter (+/- 898) = 231 MB/s  
test digest::sha256::_256                ... bench:       1,379 ns/iter (+/- 88) = 185 MB/s  
test digest::sha256::_8192               ... bench:      34,263 ns/iter (+/- 2,441) = 239 MB/s  
test digest::sha256::block_len           ... bench:         562 ns/iter (+/- 18) = 113 MB/s  
test digest::sha384::_1000               ... bench:       2,789 ns/iter (+/- 139) = 358 MB/s  
test digest::sha384::_16                 ... bench:         367 ns/iter (+/- 14) = 43 MB/s  
test digest::sha384::_2000               ... bench:       5,547 ns/iter (+/- 243) = 360 MB/s  
test digest::sha384::_256                ... bench:       1,057 ns/iter (+/- 32) = 242 MB/s  
test digest::sha384::_8192               ... bench:      32,851 ns/iter (+/- 10,469) = 249 MB/s  
test digest::sha384::block_len           ... bench:       1,048 ns/iter (+/- 268) = 122 MB/s  
test digest::sha512::_1000               ... bench:       4,090 ns/iter (+/- 99) = 244 MB/s  
test digest::sha512::_16                 ... bench:         380 ns/iter (+/- 175) = 42 MB/s  
test digest::sha512::_2000               ... bench:       5,568 ns/iter (+/- 298) = 359 MB/s  
test digest::sha512::_256                ... bench:       1,067 ns/iter (+/- 23) = 239 MB/s  
test digest::sha512::_8192               ... bench:      22,429 ns/iter (+/- 902) = 365 MB/s  
test digest::sha512::block_len           ... bench:         724 ns/iter (+/- 52) = 176 MB/s  

If you have any suggestions for optimisations, I'm happy to give them a go.

@samscott89
Copy link
Contributor Author

Haha, #187 wasn't there a few days ago when I first looked for something to solve 😄

Well, I was looking for something to get my hands dirty anyway.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 81.044% when pulling 727aad5cd5cae3e4017ef34c0bc74340f396a175 on samscott89:SHA2_impl into dcdf473 on briansmith:master.


// SHA512 functions
#[inline] fn big_s0_512(x: W64) -> W64 { x.rotr(28) ^ x.rotr(34) ^ x.rotr(39) }
#[inline] fn big_s1_512(x: W64) -> W64 { x.rotr(14) ^ x.rotr(18) ^ x.rotr(41) }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: your alignment is messed up here.

@briansmith
Copy link
Owner

Thanks for submitting this!

The duplication of effort with #187 is unfortunate. Maybe you and @gsoltis can work together to determine a path forward?

Like I said in #187, since the rust-crypto code is faster, I think we need to choose between (a) adding more commits that optimize the code further (either the code in #187 or the code here), or (b) simply use the rust-crypto implementation of the block function, if we don't think we can implement something that is as efficient as rust-crypto.

Also, unless/until the pure-Rust implementation is as fast as the assembly-language implementations in ring, we'll continue to use the assembly-language implementations on whatever platforms we have faster assembly language implementations for. (Ideally, we'd figure out a way to do a pure Rust implementation that is faster than the ASM code, but at least on x86-64 it looks to be not even close. Maybe on ARM the story is different.) This means, in particular, that the final PR needs to include a commit that uses ![cfg()] to choose between using an asm implementation or the pure-Rust implementation based on what's available for that platform. (If we can't benchmark on every platform yet, then we can just assume that the asm implementation is faster, when an asm implementation is available, as the OpenSSL team has spent a lot of effort optimizing the asm code on many different platforms.)

@gsoltis
Copy link

gsoltis commented May 29, 2016

I did a little bit of profiling of my implementation, and it looks like the wrapping bits are showing up the most. I'm going to look at reducing usage of Wrapping to see if that makes a difference.

Interestingly, I think that was also the case with the SHA-1 implementation that @SimonSapin contributed. Is there a difference between using Wrapping vs using the wrapping_XXXfunctions? Either way, this should be 100% free, so it's worth filing a perf bug against rust-lang/rust if they are not free.

But, failing that, I think the conditional include of the ASM implementation or rust-crypto is probably that way to go.

We don't need an implementation that is as fast as the ASM implementations. We just need a Rust implementation to fall back to on platforms that don't have ASM implementations, which is about as fast as rust-crypto (otherwise we can use rust-crypto).

@gsoltis
Copy link

gsoltis commented May 31, 2016

I did a little more profiling, and it seems like Wrapping is not itself the issue. Or rather, it costs about the same as .wrapping_add. Nearly all the time is spent in .wrapping_add and .rotate_right for core::num. My guess is that any optimizations will have to center around doing fewer of those calls, or doing them more efficiently.

@briansmith
Copy link
Owner

I think it is probably worth studying what other implementations do, regarding reducing the state size (I think) and regarding unrolling. Most implementations look much different than this because they are partially or often fully unrolled, which allows for some really important operations. Also, reformulating the structure of the implementation might enable some auto-vectorization to happen.

You're not going to reduce the number of additions or rotations by any significant amount, because they are fundamental to the algorithm.

@samscott89
Copy link
Contributor Author

Will have a look at implementing these suggestions to improve performance. And, ultimately, add the cfg flags where necessary.

@samscott89
Copy link
Contributor Author

Played around with some optimisations. The commits above made the following changes:

test digest::sha256::_1000
-----------------------

Before:                    191 MB/s
Alternate Sigma functions: 200 MB/s
Remove Wrapping:           194 MB/s
Loop unrolling:            218 MB/s
Add Wrapping:              213 MB/s

asm no extensions:         307 MB/s
asm sse3:                  386 MB/s
asm avx:                   406 MB/s
asm shaext:                462 MB/s

Interestingly, removing Wrapping initially made performance worse, but after unrolling the loops a little it is better to remove it.

At the bottom I've shown the times for the included openssl/boringssl asm when disabling different extensions. I assume rust has limited access to those instructions, so is the best we could hope to achieve the top line?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling bc2545ee5c48849f2cb6c3b7920434dd1d39b542 on samscott89:SHA2_impl into * on briansmith:master*.

@samscott89
Copy link
Contributor Author

With this last push, performance for SHA-256 is now equivalent (on my machine) to rust-crypto. Whereas the SHA-512 performance is actually better (by just over 5%).

However, as I mentioned in the other PR comments, if rust-crypto were to use the optimised Sigma variants, they would again be ahead.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 81.598% when pulling e5d8b07391c0ed9c498ca73e317b3b93e04c56ca on samscott89:SHA2_impl into 7490753 on briansmith:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 81.651% when pulling e5d8b07391c0ed9c498ca73e317b3b93e04c56ca on samscott89:SHA2_impl into 7490753 on briansmith:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 81.642% when pulling aac3226e97a7a2b30fe86abbfd83a1982cd17a42 on samscott89:SHA2_impl into 7490753 on briansmith:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 81.598% when pulling ae3ef1b64ed9350fe400f60da816093c2752f5fc on samscott89:SHA2_impl into 7490753 on briansmith:master.

@coveralls
Copy link

coveralls commented Jun 6, 2016

Coverage Status

Coverage decreased (-0.1%) to 81.758% when pulling ae3ef1b64ed9350fe400f60da816093c2752f5fc on samscott89:SHA2_impl into 7490753 on briansmith:master.

@briansmith
Copy link
Owner

briansmith commented Jul 3, 2016

With this last push, performance for SHA-256 is now equivalent (on my machine) to rust-crypto. Whereas the SHA-512 performance is actually better (by just over 5%).

However, as I mentioned in the other PR comments, if rust-crypto were to use the optimised Sigma variants, they would again be ahead.

I really appreciate the benchmarking done here and the evidence-based optimization work to get this into awesome shape.

I'm a little bit unsure what's going on with this PR and the other PR, #187. Did we decide to do everything in this PR (#199)?

It seems like we'll still need to add the conditional logic to use the assembly-language implementations on the platforms that we have them for, right?

The main goal I had when I filed the issue for adding a Rust SHA-2 implementation was to provide a fallback for platforms where we don't have asm implementations. Later, I got the idea we might just use rust-crypto's implementations for all those platforms we need fallbacks for, so I investigated that path a bit. That path seems attractive because it would eliminate some duplication of effort. But, rust-crypto does a lot of stuff we intentionally avoid doing in ring, and also its license isn't as nice as the ISC-style license we're using in ring. Also, I contacted the rust-crypto author about splitting out the "core" of rust-crypto into a library we might share, but understandably that's kind of inconvenient for everybody too.

Anyway, investigating those things in the background was why I was slow to get back to this. Now it seems like we should go ahead and just add our own SHA-2, Poly1305, ChaCha20, etc. implementations.

The remaining thing I'm uncertain about is this: How are we going to test this? On Travis CI we have four architectures: x86, x86-64, ARM, and Aarch64. And, we have asm implementations for every one. We could pick one platform--probably x86-64, and modify travis.sh so that it rebuilds ring on that platform with the ASM implementations disabled, and then re-tests. Or, we could try adding MIPS to Travis CI via qemu like we did for ARM. Since we don't have any asm code for MIPS, that port would then serve as the platform for testing this. I would prefer the latter (adding MIPS), but I'm also fine with the former (adjusting the travis script for x86-64 to also test this.)

@samscott89
Copy link
Contributor Author

samscott89 commented Jul 12, 2016

The reason I pushed some changes to the other PR branch was because there was an incredibly simple optimisation trick which improved performance significantly, so I thought it would be interesting to see if it had the same effect for the other implementation.

I'll continue pushing things along in this branch and if @gsoltis is okay with it we'll just keep things here.

Personally I'm a big fan of the approach and philosophy you have in this project, so I think it is definitely worth writing new implementations. I would also argue there is value in having pure rust implementations of these algorithms for auditing purposes. For a ~50% performance penalty, some applications may consider it worthwhile to use the simpler code.

That being said, I'll have a play with qemu because I'm interested to see whether the optimisations I made were only effective because it allowed the compiler to leverage some of the more modern instructions which might not be available to other architectures anyway. In which case the naive code might be preferable?

@samscott89
Copy link
Contributor Author

Ok, that should do it. I have tests passing on my machine and on qemu when I merge #256 followed by this.

@briansmith
Copy link
Owner

OK, I had a serious look at this. Before we dive deep into it, do you think any of these transformations are a good idea?: https://github.com/briansmith/ring/tree/sha2

@samscott89
Copy link
Contributor Author

My first thought was that they all looked to be sensible modifications. However, according to the benchmarks, splitting the loop is a good idea, but the refactoring of the message schedule to only use 16 values hurts performance. Perhaps the extra arithmetic needed for the indices isn't worth it.

Splitting the loop (commits 08d0c45 ... 73f282f) gives you a ~5% speedup but shrinking the message schedule (commit 2f18b5b) ends up losing a full 10% (from the base case, i.e. over 15% overall).

One minor thing: I get that Cargo features are supposed to be additive so the syntax of asm makes more sense over no_asm. However, I would expect that if multiple packages used ring and one specified not to use asm code, then the overall preference would be to not use asm?

@briansmith
Copy link
Owner

but the refactoring of the message schedule to only use 16 values hurts performance. Perhaps the extra arithmetic needed for the indices isn't worth it.

Most likely, it's inhibiting vectorization. I guess we won't do that then. Thanks for measuring it.

@briansmith
Copy link
Owner

One minor thing: I get that Cargo features are supposed to be additive so the syntax of asm makes more sense over no_asm. However, I would expect that if multiple packages used ring and one specified not to use asm code, then the overall preference would be to not use asm?

Let's say there's a program P that uses crate C that uses ring, and the crate. Let's say that C's dependency on ring is just a normal dependency using the defaults. If we have "no_asm" then P to can depend on ring with the "no_asm" feature and then the build will work as long as C doesn't use any features that are disabled by "no_asm". If we have "asm" instead then would there be a way for "P" to get the same effect? I don't think so, but I'm not sure. So, this seems like evidence in favor of "no_asm".

On the other hand, with "asm" crates could, in theory, advertise that they work in the no-asm mode in their Cargo.toml, but with "no_asm" there's no such way. This seems like a minor benefit of "asm" though.

On the other other hand, let's say that we have "no_asm" and then C depends on ring with "no_asm" enabled. Then P and any other crates that P uses won't be able to use any of the features that require the asm optimizations, even if they would work! Thus, if we go with the "no_asm" design we would have to document that it shouldn't be used by libraries, only by the top-level program.

It seems like it will be common for library crates to ignore the asm/no_asm issue and so we should do whatever lets the top-level program override the default choice. If there's a way to do that overriding with "asm" then I think that's best; otherwise, I think "no_asm" is best.

@samscott89
Copy link
Contributor Author

It seems like it will be common for library crates to ignore the asm/no_asm issue and so we should do whatever lets the top-level program override the default choice. If there's a way to do that overriding with "asm" then I think that's best; otherwise, I think "no_asm" is best.

Agreed.

One particularly ugly option would be use "asm" as a default feature and add a further "no_asm_override" feature or something which when activated would take precedence over the "asm" feature. This would require #[cfg(all(feature = "asm," not(feature = "no_asm_override")))]

At which point, the question would be why is there an asm option in the first place.

Another option would be to just change the syntax slightly. So as opposed to "no_asm", it could be "use_rust_code" .

@briansmith
Copy link
Owner

Please check out https://github.com/briansmith/ring/tree/sha2-3.

I dropped my proposed no_asm -> asm change. I also dropped all of my proposed optimizations, including the splitting. It isn't clear to me why the splitting would make things so much faster, unless it is purely due to saving some load-stores or allowing better reordering of operations. In any case, if the splitting is really faster, we can do it later.

I did some experimenting with trying to reduce the duplicate code, without decreasing performance. Could you check which changes in the sha2-3 branch decrease performance? You mentioned earlier that using Wrapping is somehow actually faster than using wrapping_add. Is Could you double-check that? There should be no reason for it. Using Wrapping makes it harder to create trait-based functions that work over both u32 and u64, so I'd like to switch (back) to wrapping_add if it doesn't hurt performance.

I suggested adding a dependency on the num-traits crate so that we can write generic functions. I prefer to write generic functions rather than copy-paste if we can do so without hurting performance. I'm not that excited on depending on an external crate just for this; maybe in the future we can create a trait similar (and simpler than) PrimInt in ring::polyfill so we can remove this dependency.

You said that using the precomputed tables is significantly faster, right? If so, we might consider using such a precomputed table for SHA-1 too, as the slow SHA-1 is actually a huge slowdown when it comes to running the test suite.

Anyway, I think this is getting close to landing.

@samscott89
Copy link
Contributor Author

I think the use of traits kills performance. Commit 644d870 slows everything down to barely 1MB/s.

Previously I've tested inline functions vs macros and there didn't seem to be any difference, so I can only assume it's the trait object which is causing the slowdown. Replacing the trait functions with macros in the head of your branch recovers performance again as I've done here: samscott89@cb8752f.

You said that using the precomputed tables is significantly faster, right? If so, we might consider using such a precomputed table for SHA-1 too, as the slow SHA-1 is actually a huge slowdown when it comes to running the test suite.

I don't think that was me. Could simply remove the SHA-1 tests from the Travis set, since SHA-1 is mostly deprecated in ring anyway?

@djc
Copy link
Contributor

djc commented Aug 3, 2016

I think the use of traits kills performance. Commit 644d870 slows everything down to barely 1MB/s.

@samscott89 maybe you can file a rustc bug about this. It seems like the compiler should be able to monomorphize the traits for this case to improve the performance. (I'd do it myself, but you'll probably be able to more accurately describe the problem and answer any questions they might have.)

@samscott89
Copy link
Contributor Author

maybe you can file a rustc bug about this. It seems like the compiler should be able to monomorphize the traits for this case to improve the performance. (I'd do it myself, but you'll probably be able to more accurately describe the problem and answer any questions they might have.)

It's a bit bizarre, I haven't been able to reproduce it with a minimal example. Checking the assembly code the problem is fairly obvious: the PrimInt::rotate_right method isn't inlined which adds a few cycles of overhead for calling the function and moving values around. Whereas in the macroed version there is just a single ror instruction used.

However, as I said before I can't reproduce this with a minimal example.

In any case, perhaps we should just leave a comment explaining why macros are used instead of traits over generics?

@SimonSapin
Copy link
Contributor

Does adding #[inline] to the method definition help?

@samscott89
Copy link
Contributor Author

Adding #[inline] to the definition in num_traits here: https://github.com/rust-num/num/blob/master/traits/src/int.rs#L302 indeed works.

@samscott89
Copy link
Contributor Author

Is there any way of forcing #[inline] for an external function?

@Ms2ger
Copy link
Contributor

Ms2ger commented Aug 10, 2016

I don't believe so, so I filed rust-num/num#218 about it.

@samscott89
Copy link
Contributor Author

Cool, thanks!

@briansmith
Copy link
Owner

@samscott89 There are many things we can do:

  1. If we use the block_data_order function I suggested, then I think we can make these functions nested functions of the resultant block_data_order. This would be about the same as your "leave them as macros" approach but it seems cleaner to me. Something like this:
macro_rules! block_data_order {
[...]
{
+        // Same as the same-named function in `ring::digest::sha1`.
+        #[inline(always)]
+        fn ch(x: $Word, y: $Word, z: $Word) -> $Word {
+            (x & y) ^ (!x & z)
+        }
+
         let state = &mut $state[..$SHA.chaining_words];
  1. I'm not very excited about depending on rust-num just for this anyway. I can spend some time to create something like PrimInt and then we could base this on the new thing.

I think #1 is good enough. I would eventually like to get rid of all the macros but it seems we need language improvements to the type system before it would be possible.

@briansmith
Copy link
Owner

OK, I feel pretty strongly now that we should avoid the num-traits crate at least until we have more significant need for it. I don't want to duplicate that effort but also I don't want to add that dependency for such a small thing. Please let me know if you can update the PR to use the nested function approach inside the block_data_order! macro. Otherwise, I can take a stab at it next week.

Regardless, this has drug on for a long time (mostly my fault, it seems) so LMK what I can do to help finish this off.

@samscott89
Copy link
Contributor Author

Please let me know if you can update the PR to use the nested function approach inside the block_data_order! macro. Otherwise, I can take a stab at it next week.

Will do.

@samscott89
Copy link
Contributor Author

Something like this: samscott89@d3711b4 ?

@briansmith
Copy link
Owner

Something like this: samscott89/ring@d3711b4 ?

Yes, that seems fine. Will look at it again after the travis queue clears a bit.

@briansmith
Copy link
Owner

See #256 (comment). This is very good work but it is in conflict with some other things we're trying to do with respect to increasing confidence in the code quality, so we're going to park it for now.

@briansmith briansmith changed the title Implement SHA2 in Rust. [On Hold] Implement SHA2 in Rust. Nov 24, 2016
Copy link
Owner

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the most important thing for this code (aside from avoiding side channels) is that, when ring is compiled optimized for size, it should be small. Most people interested in this code at the present time don't care as much about the performance as they care about the size.

I think now is a good time to revisit this work and use this as a fallback implementation for platforms where we don't have (and don't want to maintain) assembly language implementations, in particular WebAssembly and MIPS.

Is anybody interested in this for any other reason?

@samscott89 Are you still interested in this? I wouldn't be surprised if you are or aren't. If you are, once it is updated I can approve and merge it right away.

type W32 = Wrapping<u32>;
type W64 = Wrapping<u64>;

macro_rules! ch {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all these uses of macros can be replaced by const fn generic functions now.

#[inline]
fn small_s1_512(x: W64) -> W64 { rotr!((rotr!(x, 42) ^ x), 19) ^ (x >> 6) }

pub fn block_data_order_256(state: &mut [u64; MAX_CHAINING_LEN / 8],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have SHA-256 and SHA-512 in separate files.

@@ -37,6 +37,19 @@ pub fn wrapping_rotate_left_u32(x: core::num::Wrapping<u32>, n: u32)
pub mod slice {
use core;

#[cfg(feature="no_asm")]
#[inline(always)]
pub fn u64_from_be_u8(buffer: &[u8; 8]) -> u64 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be removed in favor of using the endian submodule.

@@ -37,6 +37,19 @@ pub fn wrapping_rotate_left_u32(x: core::num::Wrapping<u32>, n: u32)
pub mod slice {
use core;

#[cfg(feature="no_asm")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than have a feature no_asm, I think it makes sense to do this conditionally based on whether the target is not x86, x86_64, arm or aarch64.

@briansmith briansmith changed the title [On Hold] Implement SHA2 in Rust. Implement SHA2 in Rust. Feb 5, 2019
@briansmith
Copy link
Owner

briansmith commented Jul 1, 2019

PR #863 has a new implementation of SHA-2. Please take a look. Thanks again for the effort here.

@briansmith briansmith closed this Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants