Add content hashing as a versioning strategy #164
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes #129
This PR adds file contents hashing as an alternative versioning strategy for the
go
andmanifest
sources.For the manifest source we already track a specific list of source files that make up the dependency 👍 .
For the go source it's a little more complicated - see the issue for more details - and licensed will hash all files in the import path directory. I took this approach instead of enumerating
*Files
properties from thego list
output to include metadata files (includingLICENSE
files) if available.I'm using a pure ruby implementation of the xxHash algorithm via the
ruby-xxHash
gem. It's not the most recent version of the algorithm (which is xxH3) but it is still much faster than an MD5 hash and should see fewer or equal collisions to that algorithm. For the size of the content in my tests it was significantly faster to use content hashing as it was to obtain the Git SHA.