Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: MiniCPM-V-2_6-int4 #7727

Closed
tangent2018 opened this issue Aug 21, 2024 · 5 comments · Fixed by #9891
Closed

[New Model]: MiniCPM-V-2_6-int4 #7727

tangent2018 opened this issue Aug 21, 2024 · 5 comments · Fixed by #9891
Labels
new model Requests to new models

Comments

@tangent2018
Copy link

The model to consider.

https://huggingface.co/openbmb/MiniCPM-V-2_6-int4

The closest model vllm already supports.

No response

What's your difficulty of supporting the model you want?

Load model weight error when run with MiniCPM-V-2_6-int4.

vllm environment:
docker image: vllm/vllm-openai:v0.5.4
pip install bitsandbytes==0.43.3

run example

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM-V-2_6-int4", trust_remote_code=True)
llm = LLM(
    model=MODEL_NAME,
    gpu_memory_utilization=1,
    trust_remote_code=True,
    max_model_len=2048,
    enforce_eager=True,
)

feedback

rank0]: Traceback (most recent call last):nBMB/MiniCPM-V/code_vllm# nano +685 /usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minicpmv.py
[rank0]:   File "/home/tangent/AIChat/engines/OpenBMB/MiniCPM-V/code_vllm/local_try.py", line 13, in <module>
[rank0]:     llm = LLM(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 158, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 445, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 249, in __init__
[rank0]:     self.model_executor = executor_class(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 47, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 36, in _init_executor
[rank0]:     self.driver_worker.load_model()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 139, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 722, in load_model
[rank0]:     self.model = get_model(model_config=self.model_config,
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
[rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 327, in load_model
[rank0]:     model.load_weights(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minicpmv.py", line 685, in load_weights
[rank0]:     param = params_dict[name]
[rank0]: KeyError: 'llm.layers.0.mlp.down_proj.weight'
@tangent2018 tangent2018 added the new model Requests to new models label Aug 21, 2024
@DarkLight1337
Copy link
Member

cc @HwwwwwwwH

@seanzhang-zhichen
Copy link

AssertionError: Attempted to load weight (torch.Size([2479104, 1])) into parameter (torch.Size([4304, 1152]))
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:01<?, ?it/s]

@Sakura4036
Copy link

+1

@LDLINGLINGLING
Copy link

hello,The model you are using is the bnb quantization int4 model. This quantization model does not support vllm.

@Sakura4036
Copy link

hello,The model you are using is the bnb quantization int4 model. This quantization model does not support vllm.

oh, i see. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants