🔥 [NeurIPS24] ProMaC: Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
Code release of paper:
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong
Queen Mary University of London, Shanghai Jiao Tong University
- [2024.09.25] ProMaC is accepted to NeurIPS 2024!
- [2024.08.30] Model running instructions with LLaVA1.5 on CAMO and COD10K datasets are released.
- [2024.08.26] Demo of ProMaC is released.
- [2024.08.26] Model running instructions with LLaVA1.5 on CHAMELEON dataset is released.
Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object.To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task.Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts.
A brief introduction of how we ProMaC do! Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image. These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks consistenting with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks.- Download the datasets from the follow links:
Camouflaged Object Detection Dataset
- Put it in ./data/.
- When playing with LLaVA, this code was implemented with Python 3.8 and PyTorch 2.1.0. We recommend creating virtualenv environment and installing all the dependencies, as follows:
# create virtual environment
virtualenv ProMaC
source ProMaC/bin/activate
# prepare LLaVA
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
cd ..
# prepare SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
pip install opencv-python imageio ftfy urllib3==1.26.6
pip install diffusers transformers==4.36.0 accelerate scipy safetensors protobuf
- Our ProMaC is a training-free test-time adaptation approach, so you can play with it by running:
python main.py --config config/CHAMELEON.yaml
or
bash script_llava.sh
We further prepare a jupyter notebook demo for visualization.
- Complete the following steps in the shell before opening the jupyter notebook.
The virtualenv environment named ProMaC needs to be created first following Quick Start.
pip install notebook
pip install ipykernel ipywidgets
python -m ipykernel install --user --name ProMaC
- Open demo.ipynb and select the 'ProMaC' kernel in the running notebook.
- Update datasets and implementation scripts
- Demo and Codes
- Keep incorporating more capabilities
If you find our work useful in your research, please consider citing:
@article{hu2024leveraging,
title={Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation},
author={Hu, Jian and Lin, Jiayi and Yan, Junchi and Gong, Shaogang},
journal={arXiv preprint arXiv:2408.15205},
year={2024}
}