【bloom】convert_checkpoint.py local variable 'int8_weights' referenced before assignment #741
Labels
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
I follow the readme :
Build model with both INT8 weight-only and INT8 KV cache enabled
python convert_checkpoint.py --model_dir ./bloom/560m/
--dtype float16
--int8_kv_cache
--use_weight_only --output_dir ./bloom/560m/trt_ckpt/int8/1-gpu/
trtllm-build --checkpoint_dir ./bloom/560m/trt_ckpt/int8/1-gpu/
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./bloom/560m/trt_engines/int8/1-gpu/
and my script is
python convert_checkpoint.py --model_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3
--dtype float16
--int8_kv_cache
--use_weight_only --output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_ckpt/int8/1-gpu/
trtllm-build --checkpoint_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3//trt_ckpt/int8/1-gpu/
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_engines/int8/1-gpu/
and I got
Traceback (most recent call last):
File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 899, in
weights = convert_hf_bloom(
File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 668, in convert_hf_bloom
np.array([1.0 / int8_weights['scale_y_quant_orig']],
UnboundLocalError: local variable 'int8_weights' referenced before assignment
The code in convert_checkpoint.py shows that if use_smooth_quant ==False, the int8_weights will not been calculate.
The text was updated successfully, but these errors were encountered: