Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vcpkg] Switch to internal hash algorithms πŸ±β€πŸ’» #7757

Merged
merged 1 commit into from
Aug 26, 2019

Conversation

strega-nil
Copy link
Contributor

@strega-nil strega-nil commented Aug 19, 2019

On non-Windows platforms, there is no standard way to get the hash of an
item -- before this PR, what we did was check for the existence of a few
common utility names (shasum, sha1, sha256, sha512), and then call that
utility on a file we created containing the contents we wish to hash.
This PR adds internal hashers for sha1, sha256, and sha512, and
standardizes the interface to allow anyone to implement hashers in the
future.

These hashers are not extremely optimized, so it's likely that in the
future we could get more optimized, but for now we just call out to
BCryptHasher on Windows, since it's standard and easy to use (and about
2x faster for sha1 and sha256, and 1.5x faster for sha512). However,
they are reasonably fast for being unoptimized. I attempted a few minor
optimizations, which actually made the code slower! So as of right now,
it's implemented as just a basic conversion of the code on Wikipedia to
C++. I have tested these on the standard NIST test vectors (and those
test vectors are located in vcpkg-test/hash.cpp).

@strega-nil
Copy link
Contributor Author

As one can see from these benchmarks, BCryptHasher takes about 50% of the time as my implementation, which to be fair is the simplest implementation possible. Since BCryptHasher is so much faster, and we're guaranteed to have it on Windows, I chose to keep the BCryptHasher code in on Windows -- on macOS and Linux, I deemed it more important to be able to hash without calling out to the system.

The code as seen right now is incomplete; I will attempt to finish it before the end of today.

note: I attempted some minor optimizations, but they resulted in slower code, so I reverted them.

@strega-nil strega-nil marked this pull request as ready for review August 20, 2019 19:18
@strega-nil strega-nil self-assigned this Aug 20, 2019
@strega-nil strega-nil added the info:internal This PR or Issue was filed by the vcpkg team. label Aug 20, 2019
@strega-nil strega-nil changed the title [vcpkg] Don't depend on system for hashing [vcpkg] Switch to internal hash algorithms πŸ±β€πŸ’» Aug 21, 2019
@vicroms
Copy link
Member

vicroms commented Aug 22, 2019

LGTM, but I'll admit that I skimmed through the actual algorithms implementation code.

On non-Windows platforms, there is no standard way to get the hash of an
item -- before this PR, what we did was check for the existence of a few
common utility names (shasum, sha1, sha256, sha512), and then call that
utility on a file we created containing the contents we wish to hash.
This PR adds internal hashers for sha1, sha256, and sha512, and
standardizes the interface to allow anyone to implement hashers in the
future.

These hashers are not extremely optimized, so it's likely that in the
future we could get more optimized, but for now we just call out to
BCryptHasher on Windows, since it's standard and easy to use (and about
2x faster for sha1 and sha256, and 1.5x faster for sha512). However,
they are reasonably fast for being unoptimized. I attempted a few minor
optimizations, which actually made the code slower! So as of right now,
it's implemented as just a basic conversion of the code on Wikipedia to
C++. I have tested these on the standard NIST test vectors (and those
test vectors are located in vcpkg-test/hash.cpp).
@strega-nil
Copy link
Contributor Author

Merging!

@strega-nil strega-nil merged commit 7827239 into microsoft:master Aug 26, 2019
@strega-nil strega-nil deleted the shasum branch August 26, 2019 20:19
@contre
Copy link
Contributor

contre commented Aug 26, 2019

The code comment I left is still unresolved. Not technically a bug since it will fall back to checking every character but your intent for that check is lost.

@strega-nil
Copy link
Contributor Author

@contre what code comment?

@contre
Copy link
Contributor

contre commented Aug 27, 2019

I’m probably using the wrong term as I rarely use GitHub but I left a comment, through the GitHub interface, on the changes for this PR. Specifically it says I started a review on stringview.cpp

@strega-nil
Copy link
Contributor Author

@contre if you start a review, but don't submit, I can't see those comments

{
return other.size() == size() && memcmp(data(), other.data(), size()) == 0;
return lhs.size() == lhs.size() && memcmp(lhs.data(), rhs.data(), lhs.size()) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== rhs.size() I assume instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof

strega-nil added a commit to strega-nil/vcpkg that referenced this pull request May 5, 2021
On non-Windows platforms, there is no standard way to get the hash of an
item -- before this PR, what we did was check for the existence of a few
common utility names (shasum, sha1, sha256, sha512), and then call that
utility on a file we created containing the contents we wish to hash.
This PR adds internal hashers for sha1, sha256, and sha512, and
standardizes the interface to allow anyone to implement hashers in the
future.

These hashers are not extremely optimized, so it's likely that in the
future we could get more optimized, but for now we just call out to
BCryptHasher on Windows, since it's standard and easy to use (and about
2x faster for sha1 and sha256, and 1.5x faster for sha512). However,
they are reasonably fast for being unoptimized. I attempted a few minor
optimizations, which actually made the code slower! So as of right now,
it's implemented as just a basic conversion of the code on Wikipedia to
C++. I have tested these on the standard NIST test vectors (and those
test vectors are located in vcpkg-test/hash.cpp).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info:internal This PR or Issue was filed by the vcpkg team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants