[VLM] Qwen2.5-VL #12604

ywang96 · 2025-01-31T06:40:54Z

TODO:

Getting code to run
Reimplement ViT Qwen2 5 vl new vit ywang96/vllm#1
MRoPE modification
Correctness

To run this model before transformers 4.49 release, install transformers from source
pip install git+https://github.com/huggingface/transformers

Co-authored-by: @yixqiao(UC Berkeley) @wulipc(Qwen Team)

Signed-off-by: Roger Wang <ywang@roblox.com>

github-actions · 2025-01-31T06:41:05Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Co-authored-by: Yixuan Qiao <yixqiao@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com>

Signed-off-by: Roger Wang <ywang@roblox.com>

Qwen2 5 vl new vit

Signed-off-by: Roger Wang <ywang@roblox.com>

kevin-ssy · 2025-02-06T07:55:06Z

Can you show your code?

class QwenVL_VLLM:
    def __init__(self, llm_name='ckpts/Qwen2.5-VL-72B-Instruct', **llm_args):
        self.llm = LLM(
            model=llm_name,
            limit_mm_per_prompt={"image": 10, "video": 10},
            tensor_parallel_size=8,
            dtype='bfloat16',
            max_num_seqs=5,
            mm_processor_kwargs={
                "min_pixels": 28 * 28,
                "max_pixels": 1280 * 28 * 28,
                "fps": 1,
            },
            # disable_mm_preprocessor_cache=args.disable_mm_preprocessor_cache,
            **llm_args
        )
        self.sample_params = SamplingParams(temperature=0.2, max_tokens=512)
        # default processer
        self.processor = AutoProcessor.from_pretrained(llm_name, max_pixels=854 * 480)
        self.processor.tokenizer.padding_side = "left"

    def get_batch_messages(self, video_paths, queries, duration=1.0):
        messages = [
                [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "video",
                            "video": video_path,
                            "max_pixels": 360 * 420,
                            "fps": 1.0,
                        },
                        {"type": "text", "text": query},
                    ],
                }
            ] for video_path, query in zip(video_paths, queries)
        ]
        # return messages
        texts = [self.processor.apply_chat_template(
            msg, tokenize=False, add_generation_prompt=True) for msg in messages]
        image_inputs, video_inputs, video_kwargs = process_vision_info(
            messages, return_video_kwargs=True)
        return [{
            "prompt": query,
            "multi_modal_data": {
                "video": {
                    "data": v_input.numpy(),
                    "question": query,
                }
            },
        } for v_input, query in zip(video_inputs, texts)]
    
    
    def __call__(self, video_path, query, **kwargs):
        if isinstance(video_path, list) and isinstance(query, list):
            inputs = self.get_batch_messages(video_path, query)
        else:
            raise ValueError("video_path and query must be list or str")
        outputs = self.llm.generate(inputs, sampling_params=self.sample_params)
        return outputs

Sure. There you go!

DarkLight1337 · 2025-02-06T07:57:15Z

You should pass a numpy array directly to multi_modal_data.video instead of a nested dictionary. The query is already provided in prompt.

kevin-ssy · 2025-02-06T08:49:31Z

You should pass a numpy array directly to multi_modal_data.video instead of a nested dictionary. The query is already provided in prompt.

Just fixed. Brilliant thanks for your prompt reply!!!

yfllllll · 2025-02-06T09:24:07Z

@rstone3017 ，have you solved it? i also met this problem

xiayq1 · 2025-02-07T07:27:46Z

Can qwenvl2.5-7B run on V100?

Tesla V100-SXM2-32GB

transformers 4.49.0.dev0
vllm 0.7.3.dev3+gc786e75.cu124
flash_attn 2.1.0
3.
vllm serve /Qwen2.5-VL-7B-Instruct --port 8000 --host 0.0.0.0 --dtype float16 --max-model-len 256

Meets:
....
e_size":256}, use_cached_outputs=True,
ERROR 02-07 15:16:36 utils.py:608] Cannot use FA version 2 is not supported due to FA3 is only supported on devices with compute capability >= 8 excluding 8.6 and 8.9
ERROR 02-07 15:16:36 engine.py:389]
Traceback (most recent call last):
...
/miniconda3/envs/qwen25vl/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 611, in flash_attn_version
assert is_fa_version_supported(fa_version)
AssertionError
....
/site-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

linchen111 · 2025-02-07T07:33:26Z

how can I POST with local_image or local_video?

DarkLight1337 · 2025-02-07T07:54:55Z

how can I POST with local_image or local_video?

You can set --allow-local-media-path in vllm serve and pass the file URL starting with file:// in the request

DarkLight1337 · 2025-02-07T08:34:35Z

Can qwenvl2.5-7B run on V100?

Tesla V100-SXM2-32GB

transformers 4.49.0.dev0 vllm 0.7.3.dev3+gc786e75.cu124 flash_attn 2.1.0 3. vllm serve /Qwen2.5-VL-7B-Instruct --port 8000 --host 0.0.0.0 --dtype float16 --max-model-len 256

Meets: .... e_size":256}, use_cached_outputs=True, ERROR 02-07 15:16:36 utils.py:608] Cannot use FA version 2 is not supported due to FA3 is only supported on devices with compute capability >= 8 excluding 8.6 and 8.9 ERROR 02-07 15:16:36 engine.py:389] Traceback (most recent call last): ... /miniconda3/envs/qwen25vl/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 611, in flash_attn_version assert is_fa_version_supported(fa_version) AssertionError .... /site-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.

This should be fixed by #12828, can you try using the latest code?

MotorBottle · 2025-02-07T13:22:39Z

I have build vllm and this branch from source and I do get the following error:

TypeError: Unknown image model type: qwen2_5_vl

I am serving the model as the following:

vllm serve Qwen/Qwen2.5-VL-72B-Instruct --quantization bitsandbytes --load-format bitsandbytes --pipeline_parallel_size 2 --max_model_len 10000

Were you able to run this model in bnb quantization? I tried to but failed. #12900 Could you provide any idea or instruction how to fix this? Appreciated

ywang96 · 2025-02-07T17:53:32Z

I have build vllm and this branch from source and I do get the following error:
TypeError: Unknown image model type: qwen2_5_vl
I am serving the model as the following:
vllm serve Qwen/Qwen2.5-VL-72B-Instruct --quantization bitsandbytes --load-format bitsandbytes --pipeline_parallel_size 2 --max_model_len 10000

Were you able to run this model in bnb quantization? I tried to but failed. #12900 Could you provide any idea or instruction how to fix this? Appreciated

@MotorBottle I dont think this model is supported with bnb yet. See #12604 (comment)

ransheng11 · 2025-02-08T03:44:46Z

@yfllllll have you solved it? i also met this problem

Isotr0py · 2025-02-08T07:31:38Z

@MotorBottle Can you try #12944? The BNB support for qwen2.5-vl should be added in that PR.

hxujal · 2025-02-08T08:32:18Z

how can I POST with local_image or local_video?

You can set --allow-local-media-path in vllm serve and pass the file URL starting with file:// in the request

Can you give a demo of the local image passed in？

MotorBottle · 2025-02-08T08:51:53Z

@MotorBottle Can you try #12944? The BNB support for qwen2.5-vl should be added in that PR.

Confirmed working with #12944. Qwen2.5-VL-7B-Instruct tested.

DarkLight1337 · 2025-02-08T09:36:16Z

how can I POST with local_image or local_video?

You can set --allow-local-media-path in vllm serve and pass the file URL starting with file:// in the request

Can you give a demo of the local image passed in？

vllm serve <model> --allowed-local-media-path /path/to/data

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

model = client.models.list().data[0].id

chat_response = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "file://path/to/data/path/to/image.jpg",
                    },
                },
                {"type": "text", "text": "What is in this image?"},
            ],
        }
    ],
)

thiner · 2025-02-08T09:47:56Z

I am using the latest vllm-v0.7.2 docker image, but it failed to serve qwen2.5-vl-7b model. The error message:

ValueError: The checkpoint you are trying to load has model type `qwen2_5_vl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

 

 You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Seems the docker image was not built with the required transformer version.

DarkLight1337 · 2025-02-08T09:54:36Z

Yes, you need to manually install transformers from source as they haven't released this model yet.

xiayq1 · 2025-02-08T11:34:24Z

I have build vllm and this branch from source and I do get the following error:
TypeError: Unknown image model type: qwen2_5_vl
I am serving the model as the following:
vllm serve Qwen/Qwen2.5-VL-72B-Instruct --quantization bitsandbytes --load-format bitsandbytes --pipeline_parallel_size 2 --max_model_len 10000

Were you able to run this model in bnb quantization? I tried to but failed. #12900 Could you provide any idea or instruction how to fix this? Appreciated

fixed, thx a lot

fearnworks · 2025-02-09T14:05:58Z

I am hitting this issue when trying to run :

ERROR 02-09 13:48:41 core.py:210]     bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
ERROR 02-09 13:48:41 core.py:210] RuntimeError: Expected index [5, 1943] to be smaller than self [4, 152065] apart from dimension 1 and to be smaller size than src [5, 1943]
ERROR 02-09 13:48:41 core.py:210] 
CRITICAL 02-09 13:48:41 core_client.py:158] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

with this script :

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install vllm # ---> Now there is no need to install from source because of the latest release
uv pip install flash-attn --no-build-isolation # ---> Otherwise it will use xformers, or you can use flashinfer with uv pip install flashinfer-python
uv pip install "git+https://github.com/huggingface/transformers" # ---> This needs to be the last step, at least for now, once transformers release a new version, then you can just uv pip install transformers
VLLM_USE_V1=1 vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 12434 \
    --host 0.0.0.0 \
    --max-model-len 16434 \
    --dtype bfloat16 \
    --served-model-name vision-worker \
    --limit-mm-per-prompt image=1,video=0

ZhonghaoLu · 2025-02-10T06:40:58Z

Can qwenvl2.5-7B run on V100?

Tesla V100-SXM2-32GB

transformers 4.49.0.dev0 vllm 0.7.3.dev3+gc786e75.cu124 flash_attn 2.1.0 3. vllm serve /Qwen2.5-VL-7B-Instruct --port 8000 --host 0.0.0.0 --dtype float16 --max-model-len 256

Meets: .... e_size":256}, use_cached_outputs=True, ERROR 02-07 15:16:36 utils.py:608] Cannot use FA version 2 is not supported due to FA3 is only supported on devices with compute capability >= 8 excluding 8.6 and 8.9 ERROR 02-07 15:16:36 engine.py:389] Traceback (most recent call last): ... /miniconda3/envs/qwen25vl/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 611, in flash_attn_version assert is_fa_version_supported(fa_version) AssertionError .... /site-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 911, in
uvloop.run(run_server(args))
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lzh/anaconda3/envs/qwen25vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Could you please solve it? I seem to have encountered a similar error. This error will be reported when tp>1.

@ywang96 How can I solve it?

DarkLight1337 · 2025-02-10T06:42:32Z

I think this should be fixed by #12828 already, can you pull the latest code and try again?

ZhonghaoLu · 2025-02-11T01:31:20Z

I think this should be fixed by #12828 already, can you pull the latest code and try again?

Yes, I've pulled the latest code and tried it, I don't know what caused the bug to be stably triggered in tp>1, but there is no problem deploying the 7b model on a single card。

DarkLight1337 · 2025-02-11T05:58:59Z

Can you open a new issue and show your output of collect_env.py?

jmtatsch · 2025-02-16T18:00:28Z

Tried to do inference on qwen2.5-VL via vllm 0.7.2 and the current dev transformers but it get this import error:

ImportError: cannot import name 'Qwen2_5_VLImageProcessor' from 'transformers.models.qwen2_5_vl' (/usr/local/lib/python3.12/dist-packages/transformers/models/qwen2_5_vl/init.py). Did you mean: 'Qwen2_5_VLProcessor'?

Am I doing something wrong or has transformers dev changed again?

DarkLight1337 · 2025-02-17T02:53:42Z

Transformers dev has changed. Please update vLLM and also your local version of the HF Hub repo.

initial

b400cc6

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 mentioned this pull request Jan 31, 2025

[Draft] Qwen2.5-VL #12596

Closed

DarkLight1337 self-assigned this Jan 31, 2025

add to chat utils

022387e

Co-authored-by: Yixuan Qiao <yixqiao@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com>

mergify bot added the frontend label Jan 31, 2025

ywang96 mentioned this pull request Jan 31, 2025

Release v0.7.3 #12465

Closed

2 tasks

ywang96 and others added 4 commits January 31, 2025 01:15

Merge branch 'vllm-project:main' into qwen2_5_vl

dc1155a

Add basic ViT functionality

4f9b3b8

Add new window index and new forward logic

dd12f26

mrope

a7b0143

Signed-off-by: Roger Wang <ywang@roblox.com>

mergify bot added the v1 label Feb 1, 2025

yixqiao and others added 17 commits February 1, 2025 02:27

Code cleanup

e5b127f

More cleanup and minor changes

a75217f

add test

97feddd

Signed-off-by: Roger Wang <ywang@roblox.com>

fix name

10e4604

Signed-off-by: Roger Wang <ywang@roblox.com>

Replace with SiLU

d274643

Cleanup

fdb1668

Merge pull request #1 from ywang96/qwen2_5_vl_new_vit

0d09631

Qwen2 5 vl new vit

attn

00ed88e

Signed-off-by: Roger Wang <ywang@roblox.com>

include second_per_grid_ts

6f79d82

Signed-off-by: Roger Wang <ywang@roblox.com>

Merge branch 'main' into qwen2_5_vl

94c9e1a

[fix] fix activate func and format code.

59957d5

format

d8aaf7b

Signed-off-by: Roger Wang <ywang@roblox.com>

add hf in copyright

f1f0739

Signed-off-by: Roger Wang <ywang@roblox.com>

fix second_per_grid_ts

c614bab

Signed-off-by: Roger Wang <ywang@roblox.com>

add fps

c5a056c

Signed-off-by: Roger Wang <ywang@roblox.com>

simplify

85dd9b4

Signed-off-by: Roger Wang <ywang@roblox.com>

add to doc

097d041

Signed-off-by: Roger Wang <ywang@roblox.com>

mergify bot added the documentation Improvements or additions to documentation label Feb 2, 2025

DarkLight1337 mentioned this pull request Feb 7, 2025

[Usage]: How to use local video in multimodal input? #12879

Closed

1 task

MotorBottle mentioned this pull request Feb 7, 2025

[Usage]: Failure to Init Qwen2.5-VL-7B-Instruct with inflight bnb quantization #12900

Closed

1 task

Isotr0py mentioned this pull request Feb 8, 2025

[Misc] Add qwen2.5-vl BNB support #12944

Merged

ZhonghaoLu mentioned this pull request Feb 12, 2025

When tp>1 vllm not work （Qwen2.5-VL-72B） #13124

Open

malkomes mentioned this pull request Feb 26, 2025

[Gaudi][Model] Qwen2.5-vl HabanaAI/vllm-fork#870

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Qwen2.5-VL #12604

[VLM] Qwen2.5-VL #12604

ywang96 commented Jan 31, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 31, 2025

kevin-ssy commented Feb 6, 2025

DarkLight1337 commented Feb 6, 2025

kevin-ssy commented Feb 6, 2025

yfllllll commented Feb 6, 2025

xiayq1 commented Feb 7, 2025

linchen111 commented Feb 7, 2025

DarkLight1337 commented Feb 7, 2025 •

edited

Loading

DarkLight1337 commented Feb 7, 2025 •

edited

Loading

MotorBottle commented Feb 7, 2025

ywang96 commented Feb 7, 2025

ransheng11 commented Feb 8, 2025

Isotr0py commented Feb 8, 2025

hxujal commented Feb 8, 2025

MotorBottle commented Feb 8, 2025

DarkLight1337 commented Feb 8, 2025 •

edited

Loading

thiner commented Feb 8, 2025

DarkLight1337 commented Feb 8, 2025

xiayq1 commented Feb 8, 2025

fearnworks commented Feb 9, 2025 •

edited

Loading

ZhonghaoLu commented Feb 10, 2025

DarkLight1337 commented Feb 10, 2025

ZhonghaoLu commented Feb 11, 2025

DarkLight1337 commented Feb 11, 2025

jmtatsch commented Feb 16, 2025

DarkLight1337 commented Feb 17, 2025

[VLM] Qwen2.5-VL #12604

[VLM] Qwen2.5-VL #12604

Conversation

ywang96 commented Jan 31, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 31, 2025

kevin-ssy commented Feb 6, 2025

DarkLight1337 commented Feb 6, 2025

kevin-ssy commented Feb 6, 2025

yfllllll commented Feb 6, 2025

xiayq1 commented Feb 7, 2025

linchen111 commented Feb 7, 2025

DarkLight1337 commented Feb 7, 2025 • edited Loading

DarkLight1337 commented Feb 7, 2025 • edited Loading

MotorBottle commented Feb 7, 2025

ywang96 commented Feb 7, 2025

ransheng11 commented Feb 8, 2025

Isotr0py commented Feb 8, 2025

hxujal commented Feb 8, 2025

MotorBottle commented Feb 8, 2025

DarkLight1337 commented Feb 8, 2025 • edited Loading

thiner commented Feb 8, 2025

DarkLight1337 commented Feb 8, 2025

xiayq1 commented Feb 8, 2025

fearnworks commented Feb 9, 2025 • edited Loading

ZhonghaoLu commented Feb 10, 2025

DarkLight1337 commented Feb 10, 2025

ZhonghaoLu commented Feb 11, 2025

DarkLight1337 commented Feb 11, 2025

jmtatsch commented Feb 16, 2025

DarkLight1337 commented Feb 17, 2025

ywang96 commented Jan 31, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Feb 7, 2025 •

edited

Loading

DarkLight1337 commented Feb 7, 2025 •

edited

Loading

DarkLight1337 commented Feb 8, 2025 •

edited

Loading

fearnworks commented Feb 9, 2025 •

edited

Loading