-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking change of models since PR #252 #324
Comments
I have the same issue, I cannot convert alpaca-lora models. I had to checkout previous commit then:
|
Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files) |
@eiz okay thanks, where I find the tokenizer file? |
If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model |
EDIT: NEVERMIND I was just executing a blank .py file, I need to sleep. Thank you for your script |
confirmed it worked for both llama and alpaca 7B. 🥇 |
I did the same with no issue, |
I can confirm this worked for 13B. Thank you!
@loretoparisi Try |
This works like charm. It would be amazing creating a PR for it. |
@eiz can you make a pr with the script and name it something like |
yes, there are discussions to change the fileformat again to make loading faster. |
@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:
while the 7B works fine:
|
@loretoparisi right now llama.cpp expects the 13b model to be 2 files. this is a known issue |
That is due to llama.cpp expecting it to be in 2 parts. Alpaca.cpp hardcodes the n_parts to 1 to load the 13B weights. Change the 2 at https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#L37 to a 1 and re-make. This will break loading normal 2-part 13B llama weights however. Edit: Confirmed myself using eiz's very useful conversion script. Loads the same as before with that change in main.cpp. |
This is because in main.cpp the 13B model is assumed to have 2 parts, but it is a one part file (especially with the alpaca model). @Green-Sky was faster with his reply |
Here's an updated torrent for the 7B |
I extracted the vocabulary from a (pre-PR reformatted) model as a json and from the tokenizer.model file from the original llama source for comparisons. So I ask myself, why to reformat the model if the tokenization vocabulary is already included? |
the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I initially thought that just the index could be used, but some of the frequency scores are actually modified (sequences of whitespace in particular). had to add the scores for this reason. |
Yes, the -1000000000.0 score looks like a sentinel appearing first at id 259. I guess the sentinel is used for decoding/encoding cleaning the text whitespaces and the others are used as is. |
Downloaded the tokenizer.model file. How to download the model from Git LFS now? Tried |
For others who find this issue; the link in #324 (comment) is no longer correct after recent refactoring, you need to change the |
And the user do not need to change the code for defining the right part count of the model. |
How to use
|
…xtensions-4.6.3 Bump typing-extensions from 4.6.2 to 4.6.3
After the PR #252, all base models need to be converted new.
For me, this is a big breaking change. The LoRa and/or Alpaca fine-tuned models are not compatible anymore.
Reconverting is not possible.
I see from the PR, that the tokenizer scores are written into the model.
Would it make sense to write the tokenizer scores into a seperate file to stay compatible with the (old) models?
The question then arrises, if
What you think?
The text was updated successfully, but these errors were encountered: