-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vcpkg] Switch to internal hash algorithms π±βπ» #7757
Conversation
As one can see from these benchmarks, BCryptHasher takes about 50% of the time as my implementation, which to be fair is the simplest implementation possible. Since BCryptHasher is so much faster, and we're guaranteed to have it on Windows, I chose to keep the BCryptHasher code in on Windows -- on macOS and Linux, I deemed it more important to be able to hash without calling out to the system. The code as seen right now is incomplete; I will attempt to finish it before the end of today. note: I attempted some minor optimizations, but they resulted in slower code, so I reverted them. |
system
for hashing
LGTM, but I'll admit that I skimmed through the actual algorithms implementation code. |
On non-Windows platforms, there is no standard way to get the hash of an item -- before this PR, what we did was check for the existence of a few common utility names (shasum, sha1, sha256, sha512), and then call that utility on a file we created containing the contents we wish to hash. This PR adds internal hashers for sha1, sha256, and sha512, and standardizes the interface to allow anyone to implement hashers in the future. These hashers are not extremely optimized, so it's likely that in the future we could get more optimized, but for now we just call out to BCryptHasher on Windows, since it's standard and easy to use (and about 2x faster for sha1 and sha256, and 1.5x faster for sha512). However, they are reasonably fast for being unoptimized. I attempted a few minor optimizations, which actually made the code slower! So as of right now, it's implemented as just a basic conversion of the code on Wikipedia to C++. I have tested these on the standard NIST test vectors (and those test vectors are located in vcpkg-test/hash.cpp).
Merging! |
The code comment I left is still unresolved. Not technically a bug since it will fall back to checking every character but your intent for that check is lost. |
@contre what code comment? |
Iβm probably using the wrong term as I rarely use GitHub but I left a comment, through the GitHub interface, on the changes for this PR. Specifically it says I started a review on stringview.cpp |
@contre if you start a review, but don't submit, I can't see those comments |
{ | ||
return other.size() == size() && memcmp(data(), other.data(), size()) == 0; | ||
return lhs.size() == lhs.size() && memcmp(lhs.data(), rhs.data(), lhs.size()) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== rhs.size() I assume instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof
On non-Windows platforms, there is no standard way to get the hash of an item -- before this PR, what we did was check for the existence of a few common utility names (shasum, sha1, sha256, sha512), and then call that utility on a file we created containing the contents we wish to hash. This PR adds internal hashers for sha1, sha256, and sha512, and standardizes the interface to allow anyone to implement hashers in the future. These hashers are not extremely optimized, so it's likely that in the future we could get more optimized, but for now we just call out to BCryptHasher on Windows, since it's standard and easy to use (and about 2x faster for sha1 and sha256, and 1.5x faster for sha512). However, they are reasonably fast for being unoptimized. I attempted a few minor optimizations, which actually made the code slower! So as of right now, it's implemented as just a basic conversion of the code on Wikipedia to C++. I have tested these on the standard NIST test vectors (and those test vectors are located in vcpkg-test/hash.cpp).
On non-Windows platforms, there is no standard way to get the hash of an
item -- before this PR, what we did was check for the existence of a few
common utility names (shasum, sha1, sha256, sha512), and then call that
utility on a file we created containing the contents we wish to hash.
This PR adds internal hashers for sha1, sha256, and sha512, and
standardizes the interface to allow anyone to implement hashers in the
future.
These hashers are not extremely optimized, so it's likely that in the
future we could get more optimized, but for now we just call out to
BCryptHasher on Windows, since it's standard and easy to use (and about
2x faster for sha1 and sha256, and 1.5x faster for sha512). However,
they are reasonably fast for being unoptimized. I attempted a few minor
optimizations, which actually made the code slower! So as of right now,
it's implemented as just a basic conversion of the code on Wikipedia to
C++. I have tested these on the standard NIST test vectors (and those
test vectors are located in vcpkg-test/hash.cpp).