Official PyTorch implementation of our ICLR 2024 paper:
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
Qiong Wu1,2, Zhaoxi Ke1,2, Yiyi Zhou1,2, Xiaoshuai Sun1,2, Rongrong Ji1,2,*
1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
2 Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China
*Corresponding Author.
Abstract: Recently, mixture of experts (MoE) has become a popular paradigm for achieving the trade-off between modal capacity and efficiency of multimodal large language models (MLLMs). Different from previous efforts, we are dedicated to exploring the dynamic experts in existing MLLMs and showing that a standard MLLM can also be a mixture of experts. However, achieving this target is still notoriously challenging. The well-trained MLLMs are more accustomed to the fixed pathway and a drastic change in its inference manner also greatly impedes its performance. To address these issues, we propose a novel dynamic expert routing method for existing MLLMs, termed Routing Experts (RoE), which can achieve example-dependent optimal path routing without obvious structure tweaks. Meanwhile, a new structure sparsity regularization is also introduced to force the well-trained MLLMs to learn more short-cut pathways. In addition, we also address the alignment of the training and inference of MLLMs in terms of network routing. To validate RoE, we apply it to a set of existing MLLMs, including LLaVA-1.5, LLaVA-HR and VILA, and conduct extensive experiments on a bunch of VL benchmarks. The experiment results not only show the effectiveness of our RoE in improving MLLMs' efficiency, but also yield obvious advantages over MoE-LLaVA in both performance and speed, e.g., an average performance gain of 3.3% on 5 benchmarks while being 1.61 times faster.
In Proceedings of the International Conference on Learning Representations (ICLR) 2025
This repository contains:
- ✅ Implementation of Routing Expert
- ✅ Training/evaluation scripts
git clone https://github.com/DoubtedSteam/RoE.git
cd RoE/RoE
conda create -n RoE_train python=3.10
conda activate RoE_train
pip install -r requirements.txt
Please download the annotation of the final mixture our instruction tuning data from LLaVA llava_v1_5_mix665k.json, and download the images from constituting datasets:
- COCO: train2017
- GQA: images
- OCR-VQA: download script, we save all files as
.jpg
- TextVQA: train_val_images
- VisualGenome: part1, part2
After downloading all of them, organize the data in your path,
├── coco
│ └── train2017
├── gqa
│ └── images
├── ocr_vqa
│ └── images
├── textvqa
│ └── train_images
└── vg
├── VG_100K
└── VG_100K_2
And drop half SFT data by:
python random_drop.py
Training script with DeepSpeed ZeRO-3:
bash RoE/scripts/v1_5/finetune_RoE.sh
cd RoE/lmms-eval
conda create -n RoE_eval python=3.10
conda activate RoE_eval
pip install -e .
Evaluate with lmms-eval:
bash eval_roe.sh
This project was made possible thanks to the following open-source projects/resources: