We propose the challenging ChartBench to evaluate the chart recognition of MLLMs.
We improve the Acc+ metric to avoid the randomly guessing situations.
We collect a larger set of unlabeled charts to emphasize the MLLM's ability to interpret visual information without the aid of annotated data points.
- Open source: SFT internlmv2 CKPT.
- Open source: all evaluation results.
- Open source: all data of ChartBench.
- Open source: the evaluate scripts.
- Open source: the inference scripts.
- Open source: the demo data (10%).
Please follow the official repository instructions below to set up the local environment.
- Complete the basic environment setups
- Set prompt style for both Acc+ and NQA tasks in
./Repos/utils.py
- Modify the default path of
CKPT_PATH
in./Repos/{MODEL_NAME}/infer.py
- Reimplement the
load_model
andmodel_gen
functions - The results are saved in
./Result/raw/{MODEL_NAME}.jsonl
by default - Prompt LLMs in
./Stat/gpt_filter.py
to extract number values in NQA task - Set the parameters in
./Stat/stat_all_metric.py
and the statistical results are saved in./Stat/Paper_Table
@article{ChartBench,
title={ChartBench: A Benchmark for Complex Visual Reasoning in Charts},
author={Zhengzhuo Xu and Sinan Du and Yiyan Qi and Chengjin Xu and Chun Yuan and Jian Guo},
journal={ArXiv},
year={2023},
volume={abs/2312.15915},
url={https://api.semanticscholar.org/CorpusID:266550948}
}