You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!!! Thank you so much for the great project!!!
However, when I tried to reproduce the Qwen2-VL-7B evaluation on MMMU validation set, it seems that I couldn't align the results. Using VLMEvalKit, I obtained a result of 50.56, and using lmms-eval I got 51.22. Meanwhile, I saw in the official report that the result was 54.1.
I would like to ask if there is a specific pipeline (or prompt) that should be followed for testing or if other specific parameters are required?
Hello!!! Thank you so much for the great project!!!
However, when I tried to reproduce the
Qwen2-VL-7B
evaluation on MMMU validation set, it seems that I couldn't align the results. Using VLMEvalKit, I obtained a result of 50.56, and using lmms-eval I got 51.22. Meanwhile, I saw in the official report that the result was 54.1.I would like to ask if there is a specific pipeline (or prompt) that should be followed for testing or if other specific parameters are required?
Thank you so much for your response!!!
Scripts
For
VLMEvalKit
:For
lmms-eval
:Environment Details
Python version: 3.11
VLMEvalKit
version: https://github.com/kq-chen/VLMEvalKit/tree/5803732de327d36d3bbfac4da168b6ad7ee60cc0lmms-eval
version: https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/80391ce3bfb5a19b32e7a19a2d9399e1378ed2ddResult Screenshot
VLMEvalKit
lmms-eval
(ignore
mmmu_dev_val
)The text was updated successfully, but these errors were encountered: