[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

pufanyi · 2024-12-31T09:33:27Z

Hello!!! Thank you so much for the great project!!!

However, when I tried to reproduce the Qwen2-VL-7B evaluation on MMMU validation set, it seems that I couldn't align the results. Using VLMEvalKit, I obtained a result of 50.56, and using lmms-eval I got 51.22. Meanwhile, I saw in the official report that the result was 54.1.

I would like to ask if there is a specific pipeline (or prompt) that should be followed for testing or if other specific parameters are required?

Thank you so much for your response!!!

Scripts

For VLMEvalKit:

python run.py --data MMMU_DEV_VAL --model Qwen2-VL-72B-Instruct --verbose

For lmms-eval:

lmms-eval --model=qwen2_vl --model_args=device_map=auto,pretrained=Qwen/Qwen2-VL-7B-Instruct --tasks=mmmu_val --batch_size=1 --log_samples --output_path=./logs/

Environment Details

Python version: 3.11

VLMEvalKit version: https://github.com/kq-chen/VLMEvalKit/tree/5803732de327d36d3bbfac4da168b6ad7ee60cc0

lmms-eval version: https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/80391ce3bfb5a19b32e7a19a2d9399e1378ed2dd

torch==2.5.1
torchvision==0.20.1
qwen-vl-utils==0.0.8
transformers==4.47.1
flash-attn==2.7.2.post1

Result Screenshot

VLMEvalKit

lmms-eval

(ignore mmmu_dev_val)

The text was updated successfully, but these errors were encountered:

pufanyi · 2024-12-31T09:50:23Z

Sorry, I accidentally clicked submit after editing earlier, so I temporarily closed it just now. Now I have finished editing. Thank you!

xinlyyy · 2025-01-02T12:10:10Z

Sorry, I accidentally clicked submit after editing earlier, so I temporarily closed it just now. Now I have finished editing. Thank you!

what's the attention mode you used? flash_attention_2 or eager or sdpa?

pufanyi · 2025-01-02T12:12:51Z

flash_attention_2

pufanyi closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2024

pufanyi reopened this Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

pufanyi commented Dec 31, 2024 •

edited

Loading

pufanyi commented Dec 31, 2024

xinlyyy commented Jan 2, 2025

pufanyi commented Jan 2, 2025

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

Comments

pufanyi commented Dec 31, 2024 • edited Loading

Scripts

Environment Details

Result Screenshot

VLMEvalKit

lmms-eval

pufanyi commented Dec 31, 2024

xinlyyy commented Jan 2, 2025

pufanyi commented Jan 2, 2025

pufanyi commented Dec 31, 2024 •

edited

Loading