You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ValueError: You are trying to save a non contiguous tensor: transformer.layers.0.attention.qkv.weight which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call .contiguous() on your tensor to pack it before saving.
additional notes
No such error seen on release 0.7.1
My guess is that the function get_tllm_linear_sq_weight returns some non-contiguous tensors.
The text was updated successfully, but these errors were encountered:
I got:
ValueError: You are trying to save a non contiguous tensor: transformer.layers.0.mlp.gate.weights_scaling_factor which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call .contiguous() on your tensor to pack it before saving.
System Info
GPU : NVIDIA A100 80GB
package version
tensorrt-9.2.0.post12.dev5-cp310-none-linux_x86_64.whl
[TensorRT-LLM] TensorRT-LLM version: 0.8.00.8.0
Who can help?
@Tracin @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Installation
python -m pip install tensorrt_llm==0.8.0 --extra-index-url https://pypi.nvidia.com
Create smoothquant checkpoint for LLaMA
python ./examples/llama/convert_checkpoint.py --model_dir ~/Llama-2-13b-chat-hf --output_dir ~/fp16-tp4-sq5 --dtype float16 --tp_size 4 --smoothquant 0.5 --per_token --per_channel --workers 4
Expected behavior
Checkpoint should be created.
actual behavior
Error at line - https://github.com/NVIDIA/TensorRT-LLM/blob/v0.8.0/examples/llama/convert_checkpoint.py#L1502
ValueError: You are trying to save a non contiguous tensor:
transformer.layers.0.attention.qkv.weight
which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call.contiguous()
on your tensor to pack it before saving.additional notes
No such error seen on release 0.7.1
My guess is that the function
get_tllm_linear_sq_weight
returns some non-contiguous tensors.The text was updated successfully, but these errors were encountered: