-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sha256 check sums to verify original and converted model data #338
Conversation
Good idea, but I have one suggestions. How about we put all hashes into a single file That's more standard way how to do it and you can then perform a single check using:
on Linux, or on macOS:
That way you can drop the shellscript completely. |
@prusnak Thx! RTFM on I put the
|
Great! Can you please sort the file (according to filenames) using |
Good job, thanks a lot :) I personally recommend renaming
|
The filename Also later we can add more hashes to the file, not strictly related to the models, so naming it |
Can you add I had a check failure with models/7B/ggml-model-q4_0.bin, I think a recent commit may have lead to some floating point rounding differences? I have b85058443e89dabdf674d5018d979f0d682977f8413f05b5fd235d36d7a8ff82 for that file. |
@sw I just regenerated everything from the *.pth files and now our checksums agree:
|
Yes, it works for macOS too. I updated my suggestion here to contain the option: #338 (comment) |
…fy the downloads Hashes created using: sha256sum models/*B/*.pth models/*[7136]B/ggml-model-f16.bin* models/*[7136]B/ggml-model-q4_0.bin* > SHA256SUMS
I went ahead and implemented the suggestions from above and rebased/squashed on top of the current master. |
Thanks @gjmulder for computing the hashes. Merged! |
Not all of these checksums seem to be correct. Are they calculated with the "v2" new model format after the tokenizer change? PR: #252 Issue: #324 For example, "models/alpaca-7B/ggml-model-q4_0.bin" v1: 1f582babc2bd56bb63b33141898748657d369fd110c4358b2bc280907882bf13 The SHA256SUMS file has the old v1 hash. |
I can confirm
Other files in 7B and 13B are correct. I regenerated mismatched q4 files with latest program (make clean; make). |
Please open new pull requests if something is wrong.
We should keep only the latest hashes in the SHA256SUM file, generated by the latest version of the tools in the repo. Introducing any versioning scheme can lead to even more confusion. And if you need to check the older hashes you can still check the earlier versions of the SHA256SUM file. Ideally, the same commit which changes the file format will also regenerate hashes. |
Yes, that was why I was delaying merging this pull request. See the model magic and versioning discussion in #352:
|
Not a developer, so my git-fu is a bit rusty. Hopefully this pull request covers everything?!?
Add shadow
./model.sha256
dir containing a dir for each model and a correspondingchecklist.sha256
containing sha256 sums of the *.pth bin and *json filesAdd script
./model.sha256/chk_sha256sums.sh
to walk user-supplied./models
subdir and run sha256sum against above files to diff checklist.sha256 for each modelUpdate
README.md
with corresponding instructions