Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【bloom】convert_checkpoint.py local variable 'int8_weights' referenced before assignment #741

Closed
scarydemon2 opened this issue Dec 26, 2023 · 1 comment
Assignees
Labels
Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers

Comments

@scarydemon2
Copy link

scarydemon2 commented Dec 26, 2023

I follow the readme :

Build model with both INT8 weight-only and INT8 KV cache enabled

python convert_checkpoint.py --model_dir ./bloom/560m/
--dtype float16
--int8_kv_cache
--use_weight_only --output_dir ./bloom/560m/trt_ckpt/int8/1-gpu/
trtllm-build --checkpoint_dir ./bloom/560m/trt_ckpt/int8/1-gpu/
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./bloom/560m/trt_engines/int8/1-gpu/

and my script is

python convert_checkpoint.py --model_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3
--dtype float16
--int8_kv_cache
--use_weight_only --output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_ckpt/int8/1-gpu/
trtllm-build --checkpoint_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3//trt_ckpt/int8/1-gpu/
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_engines/int8/1-gpu/

and I got

Traceback (most recent call last):
File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 899, in
weights = convert_hf_bloom(
File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 668, in convert_hf_bloom
np.array([1.0 / int8_weights['scale_y_quant_orig']],
UnboundLocalError: local variable 'int8_weights' referenced before assignment

The code in convert_checkpoint.py shows that if use_smooth_quant ==False, the int8_weights will not been calculate.

@scarydemon2 scarydemon2 reopened this Dec 26, 2023
@nv-guomingz nv-guomingz self-assigned this Dec 26, 2023
@nv-guomingz nv-guomingz added Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers labels Dec 26, 2023
@nv-guomingz
Copy link
Collaborator

nv-guomingz commented Dec 26, 2023

Hi @scarydemon2 thanks for reporting this issue. The fixing for this issue had been upstreamd to main branch. Please have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants