Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

Open
pufanyi opened this issue Dec 31, 2024 · 3 comments
Open

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result #628

pufanyi opened this issue Dec 31, 2024 · 3 comments

Comments

@pufanyi
Copy link

pufanyi commented Dec 31, 2024

Hello!!! Thank you so much for the great project!!!

However, when I tried to reproduce the Qwen2-VL-7B evaluation on MMMU validation set, it seems that I couldn't align the results. Using VLMEvalKit, I obtained a result of 50.56, and using lmms-eval I got 51.22. Meanwhile, I saw in the official report that the result was 54.1.

I would like to ask if there is a specific pipeline (or prompt) that should be followed for testing or if other specific parameters are required?

Thank you so much for your response!!!

Scripts

For VLMEvalKit:

python run.py --data MMMU_DEV_VAL --model Qwen2-VL-72B-Instruct --verbose

For lmms-eval:

lmms-eval --model=qwen2_vl --model_args=device_map=auto,pretrained=Qwen/Qwen2-VL-7B-Instruct --tasks=mmmu_val --batch_size=1 --log_samples --output_path=./logs/

Environment Details

Python version: 3.11

VLMEvalKit version: https://github.com/kq-chen/VLMEvalKit/tree/5803732de327d36d3bbfac4da168b6ad7ee60cc0

lmms-eval version: https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/80391ce3bfb5a19b32e7a19a2d9399e1378ed2dd

torch==2.5.1
torchvision==0.20.1
qwen-vl-utils==0.0.8
transformers==4.47.1
flash-attn==2.7.2.post1

Result Screenshot

VLMEvalKit

img_v3_02i2_6077a7e1-f843-44a4-a9bd-bba72b4110hu

lmms-eval

(ignore mmmu_dev_val)

8ae7a6d1e04063573ebdf6d4018e290

@pufanyi pufanyi closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2024
@pufanyi pufanyi reopened this Dec 31, 2024
@pufanyi
Copy link
Author

pufanyi commented Dec 31, 2024

Sorry, I accidentally clicked submit after editing earlier, so I temporarily closed it just now. Now I have finished editing. Thank you!

@xinlyyy
Copy link

xinlyyy commented Jan 2, 2025

Sorry, I accidentally clicked submit after editing earlier, so I temporarily closed it just now. Now I have finished editing. Thank you!

what's the attention mode you used? flash_attention_2 or eager or sdpa?

@pufanyi
Copy link
Author

pufanyi commented Jan 2, 2025

flash_attention_2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants