Unsupported auto parallel + int4 quantization on models #1626

Hudayday · 2024-05-19T12:10:23Z

System Info

Tensorrt-LLM rel 0.9.0

Who can help?

@Tracin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

In the 0.9.0 rel version, I try to apply auto parallel + quantization on the Llama 70B model, and both of them work fine independently.

I convert checkpoint using

python3 ./TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./llama-2/llama-2-70b --dtype float16 --output_dir ./model_profile_tmp/ckpt/2 --use_weight_only --weight_only_precision int4

and then I try auto parallel using

trtllm-build --checkpoint_dir ./model_profile_tmp/ckpt/2/ --gemm_plugin float16 --use_custom_all_reduce disable --output_dir ./model_profile_tmp/engine/2/ --workers 8 --max_batch_size 1 --auto_parallel 8 --weight_only_precision int4

Expected behavior

It should build the engine as float16,

actual behavior

However, it fails with:

concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 291, in build_and_save
engine = build_model(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 284, in build_model
return build(model, build_config)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 669, in build
model = optimize_model(model, use_unfused_qkv_gemm=use_auto_parallel)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 890, in optimize_model
model = unfuse_qkv_gemm(model)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 767, in unfuse_qkv_gemm
gemm.weight.value = weight
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 120, in value
assert v.shape == self._shape,
AssertionError: The value updated is not the same shape as the original. Updated: (8192, 5120), original: (8192, 8192)

additional notes

Looks like auto parallel cannot deal with quantization which changes the dimension in trt

The text was updated successfully, but these errors were encountered:

yuxianq · 2024-05-21T09:56:22Z

auto parallel does not support to work with quantization right now. Will add assertion to make the error information clearer.

github-actions · 2024-06-23T01:54:12Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

kaiyux · 2024-06-25T13:22:38Z

Clearer error information has been added to the latest main branch, closing.

Please let us know if there are any questions, thanks.

Hudayday added the bug Something isn't working label May 19, 2024

byshiue assigned yuxianq May 23, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 23, 2024

github-actions bot added the stale label Jun 23, 2024

kaiyux removed the stale label Jun 24, 2024

kaiyux mentioned this issue Jun 25, 2024

Update TensorRT-LLM #1835

Merged

kaiyux closed this as completed Jun 25, 2024

kaiyux mentioned this issue Jul 17, 2024

TensorRT-LLM v0.11 Update #1969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupported auto parallel + int4 quantization on models #1626

Unsupported auto parallel + int4 quantization on models #1626

Hudayday commented May 19, 2024

yuxianq commented May 21, 2024

github-actions bot commented Jun 23, 2024

kaiyux commented Jun 25, 2024

Unsupported auto parallel + int4 quantization on models #1626

Unsupported auto parallel + int4 quantization on models #1626

Comments

Hudayday commented May 19, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

yuxianq commented May 21, 2024

github-actions bot commented Jun 23, 2024

kaiyux commented Jun 25, 2024