llama 2 conversion script unknown error #28553

liboliba · 2024-01-17T11:52:06Z

System Info

Hi,
I have downloaded llama 2 weights and installed the transformer package. I plan to use it under transformer package and applied the conversion script.

The conversion script does not work:
python src/transformers/models/llama/convert_llama_weights_to_hf.py
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/tomyfilepath

File "...path/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 126
print(f"Fetching all parameters from the checkpoint at {input_base_path}.")
^
SyntaxError: invalid syntax

On Linux when I do for example:
ls /path/to/downloaded/llama/llama-2-7b-chat
I get:
checklist.chk consolidated.00.pth params.json

I assume I have the correct files. Any advise would be grateful.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python src/transformers/models/llama/convert_llama_weights_to_hf.py
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/tomyfilepath

Expected behavior

It is expected tokenizer and model be converted so that they are usable for transformer package.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-01-17T13:14:53Z

Hi @liboliba, thanks for raising an issue!

So that we can best help you, please make sure to follow the issue template including:

The running environment: run transformers-cli env in the terminal and copy-paste the output
The full error traceback
A minimal code reproducer. Here we don't have access to the weights. Are there weights you could share which reproduce this error?

liboliba · 2024-01-17T14:52:55Z

Thank you for the advise!
transformers-cli env returns:

transformers version: 4.36.2
Platform: Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.17
Python version: 3.11.5
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes & No, I am working under HPC environment, my current compute node has no GPU, but it shared hard disk with a GPU node. A GPU node has no internet connection and can only load model via the shared hard disk with the compute node I work on now.
Using distributed or parallel set-up in script?: No

For the other two bullets, sorry I am less sure how to respond because what I did was to download the official meta llama 2 into a folder, and then I git clone the transformer source code and try to run the conversion code. The error I get now is:

python /scratch/ll1d19/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /scratch/ll1d19/llama/llama/llama-2-7b-chat/ --model_size 7B --output_dir /scratch/ll1d19/hf_llama2/Llama-2-7b-chat-hf/
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in #24565
Traceback (most recent call last):
File "/scratch/ll1d19/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 319, in
main()
File "/scratch/ll1d19/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 307, in main
write_model(
File "/scratch/ll1d19/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 109, in write_model
tokenizer = tokenizer_class(tokenizer_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in init
super().init(
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 117, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 178, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ll1d19/.conda/envs/myiai/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: Not found: "/scratch/ll1d19/llama/llama/llama-2-7b-chat/tokenizer.model": No such file or directory Error #2

The weights are too large to share as it is about 13GB, the json file is around 100bytes.
Any advise would be grateful!

amyeroberts · 2024-01-17T15:52:45Z

Hi @liboliba, thanks for the update!

Based on the error, I'd suggest making sure you have the latest versions of tokenizers and sentencepiece installed in your environment.

There's no need to convert the official checkpoints though - there's many already available on the hub e.g. here which you can access provided you've filled out the access form; or meta-llama/Llama-2-70b-hf for llama 2.

github-actions · 2024-02-17T08:02:56Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama 2 conversion script unknown error #28553

llama 2 conversion script unknown error #28553

liboliba commented Jan 17, 2024

amyeroberts commented Jan 17, 2024

liboliba commented Jan 17, 2024 •

edited

Loading

amyeroberts commented Jan 17, 2024

github-actions bot commented Feb 17, 2024

llama 2 conversion script unknown error #28553

llama 2 conversion script unknown error #28553

Comments

liboliba commented Jan 17, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Jan 17, 2024

liboliba commented Jan 17, 2024 • edited Loading

amyeroberts commented Jan 17, 2024

github-actions bot commented Feb 17, 2024

liboliba commented Jan 17, 2024 •

edited

Loading