Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awqint4 to gguf ,ModuleNotFoundError: No module named 'awq.apply_awq' #502

Open
LDLINGLINGLING opened this issue Jun 14, 2024 · 5 comments

Comments

@LDLINGLINGLING
Copy link
Contributor

I want to use awq quantize a model, and use llama.cpp convert to gguf. but I followed the tutorial but got an error:Traceback (most recent call last):
File "/root/ld/ld_project/llama.cpp/convert_minicpm.py", line 2516, in
main()
File "/root/ld/ld_project/llama.cpp/convert_minicpm.py", line 2460, in main
from awq.apply_awq import add_scale_weights # type: ignore[import-not-found]
ModuleNotFoundError: No module named 'awq.apply_awq'

my awq version is
autoawq 0.2.5+cu121
autoawq_kernels 0.0.6

@LDLINGLINGLING
Copy link
Contributor Author

LDLINGLINGLING commented Jun 18, 2024

mport os
import subprocess
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'mistralai/Mistral-7B-v0.1'
quant_path = 'mistral-awq'
llama_cpp_path = '/workspace/llama.cpp'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 6, "version": "GEMM" }

model = AutoAWQForCausalLM.from_pretrained(
model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.quantize(
tokenizer,
quant_config=quant_config,
export_compatible=True
)

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')

GGUF conversion

print('Converting model to GGUF...')
llama_cpp_method = "q4_K_M"
convert_cmd_path = os.path.join(llama_cpp_path, "convert.py")
quantize_cmd_path = os.path.join(llama_cpp_path, "quantize")

if not os.path.exists(llama_cpp_path):
cmd = f"git clone https://github.com/ggerganov/llama.cpp.git {llama_cpp_path} && cd {llama_cpp_path} && make LLAMA_CUBLAS=1 LLAMA_CUDA_F16=1"
subprocess.run([cmd], shell=True, check=True)

subprocess.run([
f"python {convert_cmd_path} {quant_path} --outfile {quant_path}/model.gguf"
], shell=True, check=True)

subprocess.run([
f"{quantize_cmd_path} {quant_path}/model.gguf {quant_path}/model_{llama_cpp_method}.gguf {llama_cpp_method}"
], shell=True, check=True)
this is my code

@casper-hansen
Copy link
Owner

Hi @LDLINGLINGLING. This seems to be a llama.cpp package in your first message. Have you tried the GGUF export from the AutoAWQ documentation and did it succeed?

https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export

@LDLINGLINGLING
Copy link
Contributor Author

I didn't succeed,I followed the instructions in this link https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export, but the error at the top appeared

@LDLINGLINGLING
Copy link
Contributor Author

I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible

@hanasay
Copy link

hanasay commented Jul 10, 2024

I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible

Hi @LDLINGLINGLING ~
It is true that --awq-path was remove by llama.cpp! You can refer from this issue.
ggml-org/llama.cpp#5768

And by the way, I'm occur an error that might similar with this issue, hope someone can help me.

I had already converted an Phi-3-mini-128K model to AWQ.
But when I trying to convert Phi-3-awq model to gguf(by llama.cpp convert_hf_to_gguf.py), I got an error below.

INFO:hf-to-gguf:Loading model: Phi-3-mini-128k-instruct-AWQ
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,             torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:token_embd.weight,         torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.float16 --> F32, shape = {3072}
Traceback (most recent call last):
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3547, in <module>
    main()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3541, in main
    model_instance.write()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 330, in write
    self.write_tensors()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 267, in write_tensors
    for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 234, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 185, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.qweight'

The error said that it cannot map through the define layer.
I was thinking, is it possible the error occur by the layer define to a new namemodel.layers.0.mlp.down_proj.qweight, but not as the original name model.layers.0.mlp.down_proj.weight?

If that so, how do I modify it?

sorry for bad English, but hope someone can help. ;-;

BR, Matt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants