[vcpkg] Switch to internal hash algorithms 🐱‍💻 #7757

strega-nil · 2019-08-19T23:50:48Z

On non-Windows platforms, there is no standard way to get the hash of an
item -- before this PR, what we did was check for the existence of a few
common utility names (shasum, sha1, sha256, sha512), and then call that
utility on a file we created containing the contents we wish to hash.
This PR adds internal hashers for sha1, sha256, and sha512, and
standardizes the interface to allow anyone to implement hashers in the
future.

These hashers are not extremely optimized, so it's likely that in the
future we could get more optimized, but for now we just call out to
BCryptHasher on Windows, since it's standard and easy to use (and about
2x faster for sha1 and sha256, and 1.5x faster for sha512). However,
they are reasonably fast for being unoptimized. I attempted a few minor
optimizations, which actually made the code slower! So as of right now,
it's implemented as just a basic conversion of the code on Wikipedia to
C++. I have tested these on the standard NIST test vectors (and those
test vectors are located in vcpkg-test/hash.cpp).

strega-nil · 2019-08-20T17:26:16Z

As one can see from these benchmarks, BCryptHasher takes about 50% of the time as my implementation, which to be fair is the simplest implementation possible. Since BCryptHasher is so much faster, and we're guaranteed to have it on Windows, I chose to keep the BCryptHasher code in on Windows -- on macOS and Linux, I deemed it more important to be able to hash without calling out to the system.

The code as seen right now is incomplete; I will attempt to finish it before the end of today.

note: I attempted some minor optimizations, but they resulted in slower code, so I reverted them.

toolsrc/include/vcpkg/base/strings.h

vicroms · 2019-08-22T20:50:36Z

LGTM, but I'll admit that I skimmed through the actual algorithms implementation code.

On non-Windows platforms, there is no standard way to get the hash of an item -- before this PR, what we did was check for the existence of a few common utility names (shasum, sha1, sha256, sha512), and then call that utility on a file we created containing the contents we wish to hash. This PR adds internal hashers for sha1, sha256, and sha512, and standardizes the interface to allow anyone to implement hashers in the future. These hashers are not extremely optimized, so it's likely that in the future we could get more optimized, but for now we just call out to BCryptHasher on Windows, since it's standard and easy to use (and about 2x faster for sha1 and sha256, and 1.5x faster for sha512). However, they are reasonably fast for being unoptimized. I attempted a few minor optimizations, which actually made the code slower! So as of right now, it's implemented as just a basic conversion of the code on Wikipedia to C++. I have tested these on the standard NIST test vectors (and those test vectors are located in vcpkg-test/hash.cpp).

strega-nil · 2019-08-26T19:35:18Z

Merging!

contre · 2019-08-26T23:07:21Z

The code comment I left is still unresolved. Not technically a bug since it will fall back to checking every character but your intent for that check is lost.

strega-nil · 2019-08-27T00:02:40Z

@contre what code comment?

contre · 2019-08-27T00:06:15Z

I’m probably using the wrong term as I rarely use GitHub but I left a comment, through the GitHub interface, on the changes for this PR. Specifically it says I started a review on stringview.cpp

strega-nil · 2019-08-27T01:45:50Z

@contre if you start a review, but don't submit, I can't see those comments

contre · 2019-08-20T03:15:25Z

toolsrc/src/vcpkg/base/stringview.cpp

    {
-        return other.size() == size() && memcmp(data(), other.data(), size()) == 0;
+        return lhs.size() == lhs.size() && memcmp(lhs.data(), rhs.data(), lhs.size()) == 0;


== rhs.size() I assume instead

On non-Windows platforms, there is no standard way to get the hash of an item -- before this PR, what we did was check for the existence of a few common utility names (shasum, sha1, sha256, sha512), and then call that utility on a file we created containing the contents we wish to hash. This PR adds internal hashers for sha1, sha256, and sha512, and standardizes the interface to allow anyone to implement hashers in the future. These hashers are not extremely optimized, so it's likely that in the future we could get more optimized, but for now we just call out to BCryptHasher on Windows, since it's standard and easy to use (and about 2x faster for sha1 and sha256, and 1.5x faster for sha512). However, they are reasonably fast for being unoptimized. I attempted a few minor optimizations, which actually made the code slower! So as of right now, it's implemented as just a basic conversion of the code on Wikipedia to C++. I have tested these on the standard NIST test vectors (and those test vectors are located in vcpkg-test/hash.cpp).

strega-nil marked this pull request as ready for review August 20, 2019 19:18

strega-nil requested review from ras0219-msft, cbezault and vicroms August 20, 2019 23:11

strega-nil self-assigned this Aug 20, 2019

strega-nil added the info:internal This PR or Issue was filed by the vcpkg team. label Aug 20, 2019

strega-nil force-pushed the shasum branch from ba973c9 to ef090f7 Compare August 21, 2019 00:16

strega-nil changed the title ~~[vcpkg] Don't depend on system for hashing~~ [vcpkg] Switch to internal hash algorithms 🐱‍💻 Aug 21, 2019

strega-nil force-pushed the shasum branch from ef090f7 to 32a9181 Compare August 21, 2019 00:33

strega-nil commented Aug 21, 2019

View reviewed changes

toolsrc/include/vcpkg/base/strings.h Show resolved Hide resolved

strega-nil commented Aug 21, 2019

View reviewed changes

toolsrc/include/vcpkg/base/strings.h Show resolved Hide resolved

strega-nil force-pushed the shasum branch from 32a9181 to 323e67b Compare August 26, 2019 19:30

strega-nil force-pushed the shasum branch from 323e67b to 60d73ae Compare August 26, 2019 19:30

strega-nil merged commit 7827239 into microsoft:master Aug 26, 2019

strega-nil deleted the shasum branch August 26, 2019 20:19

contre reviewed Aug 27, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vcpkg] Switch to internal hash algorithms 🐱‍💻 #7757

[vcpkg] Switch to internal hash algorithms 🐱‍💻 #7757

strega-nil commented Aug 19, 2019 •

edited

Loading

strega-nil commented Aug 20, 2019

vicroms commented Aug 22, 2019

strega-nil commented Aug 26, 2019

contre commented Aug 26, 2019

strega-nil commented Aug 27, 2019

contre commented Aug 27, 2019

strega-nil commented Aug 27, 2019

contre Aug 20, 2019

strega-nil Aug 27, 2019

[vcpkg] Switch to internal hash algorithms 🐱‍💻 #7757

[vcpkg] Switch to internal hash algorithms 🐱‍💻 #7757

Conversation

strega-nil commented Aug 19, 2019 • edited Loading

strega-nil commented Aug 20, 2019

vicroms commented Aug 22, 2019

strega-nil commented Aug 26, 2019

contre commented Aug 26, 2019

strega-nil commented Aug 27, 2019

contre commented Aug 27, 2019

strega-nil commented Aug 27, 2019

contre Aug 20, 2019

Choose a reason for hiding this comment

strega-nil Aug 27, 2019

Choose a reason for hiding this comment

strega-nil commented Aug 19, 2019 •

edited

Loading