[IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Paper Link: https://arxiv.org/abs/2407.20174

Release

🔥 Check the carefully curated LLMxVisualization paper list.
🔥 A more well-organized version of the chart generation pipeline is now available in this new repository.
We have released all the code and datasets used in our paper.
The generated data and selected benchmark can be downloaded in the huggingface link.
Our model weights are available at the huggingface link.

Data Gallery

Install

Clone this repository and navigate to ChartQA-MLLM folder

git clone https://github.com/zengxingchen/ChartQA-MLLM.git
cd ChartQA-MLLM

Install Package

conda create -n llava-hr python=3.10 -y
conda activate llava-hr
pip install --upgrade pip 
pip install -e .

Install additional packages for training cases

pip install ninja
pip install flash-attn --no-build-isolation

Evaluation

You can run our evaluation bash scripts scripts/llava_hr/*.sh.

CLI Inference

Here is the command for chatting with our model without the need for a Gradio interface.

python -m model.llava_hr.serve.cli \
    --model-path ./checkpoints/llava-hr-ChartInstruction \
    --image-file "*.jpg"

Usage and License Notices:

For the base model llava: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models for checkpoints trained using the dataset (e.g. Llama community license for LLaMA-2 and Vicuna-v1.5).

Acknowledgement

Vicuna: the codebase LLaVA built upon. LLaVA's base language model is Vicuna-13B.
LLaVA: the codebase we built upon. LLaVA was the only open-sourced project with all training code open-sourced when we started this work.
LLaVA-HR: the high-resolution version model we built upon.
SemDeDup: the sampling module we are based on. SemDeDup is designed for hundred million of image sampling.
WYTIWYR: Part of data our classification are collected from here.
Unichart: Part of existing data are first collected by Unichart.

Contact

If you have any questions about this work, please email Xingchen Zeng at xingchen.zeng@outlook.com.

Citation

@article{zeng2024vis,
  author={Zeng, Xingchen and Lin, Haichuan and Ye, Yilin and Zeng, Wei},
  journal={IEEE Transactions on Visualization and Computer Graphics}, 
  title={Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning}, 
  year={2024},
  pages={1-11},
  doi={10.1109/TVCG.2024.3456159}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data_filtering		data_filtering
data_generation		data_generation
model		model
others		others
playground/eval		playground/eval
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Release

Install

Evaluation

CLI Inference

Usage and License Notices:

Acknowledgement

Contact

Citation

About

Releases

Packages

Contributors 3

Languages

zengxingchen/ChartQA-MLLM

Folders and files

Latest commit

History

Repository files navigation

[IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Release

Install

Evaluation

CLI Inference

Usage and License Notices:

Acknowledgement

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages