🔥 [NeurIPS24] ProMaC: Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

Code release of paper:

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong

Queen Mary University of London, Shanghai Jiao Tong University

🚀 News

[2024.09.25] ProMaC is accepted to NeurIPS 2024!
[2024.08.30] Model running instructions with LLaVA1.5 on CAMO and COD10K datasets are released.
[2024.08.26] Demo of ProMaC is released.
[2024.08.26] Model running instructions with LLaVA1.5 on CHAMELEON dataset is released.

💡 Highlight

Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object.To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task.Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts.

A brief introduction of how we ProMaC do!

Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image. These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks consistenting with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks.

Quick Start

Download Dataset

Download the datasets from the follow links:

Camouflaged Object Detection Dataset

COD10K
CAMO
CHAMELEON

Put it in ./data/.

Running ProMaC on CHAMELON Dataset with LLaVA1.5

When playing with LLaVA, this code was implemented with Python 3.8 and PyTorch 2.1.0. We recommend creating virtualenv environment and installing all the dependencies, as follows:

# create virtual environment
virtualenv ProMaC
source ProMaC/bin/activate
# prepare LLaVA
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
cd ..
# prepare SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
pip install opencv-python imageio ftfy urllib3==1.26.6
pip install diffusers transformers==4.36.0 accelerate scipy safetensors protobuf

Our ProMaC is a training-free test-time adaptation approach, so you can play with it by running:

python main.py --config config/CHAMELEON.yaml

or

bash script_llava.sh

Demo

We further prepare a jupyter notebook demo for visualization.

Complete the following steps in the shell before opening the jupyter notebook.
The virtualenv environment named ProMaC needs to be created first following Quick Start.

pip install notebook 
pip install ipykernel ipywidgets
python -m ipykernel install --user --name ProMaC

Open demo.ipynb and select the 'ProMaC' kernel in the running notebook.

TO-DO LIST

Update datasets and implementation scripts
Demo and Codes
Keep incorporating more capabilities

Citation

If you find our work useful in your research, please consider citing:

@article{hu2024leveraging,
  title={Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation},
  author={Hu, Jian and Lin, Jiayi and Yan, Junchi and Gong, Shaogang},
  journal={arXiv preprint arXiv:2408.15205},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.ipynb_checkpoints		.ipynb_checkpoints
clip		clip
config		config
datasets		datasets
llava		llava
vcd_utils		vcd_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
contrastive_generate.py		contrastive_generate.py
demo.ipynb		demo.ipynb
demo_v4-ezgif.com-speed.gif		demo_v4-ezgif.com-speed.gif
frame_promac.png		frame_promac.png
framework_ProMaC_v10.png		framework_ProMaC_v10.png
main.py		main.py
motivation.png		motivation.png
requirements_llava.txt		requirements_llava.txt
script_llava.sh		script_llava.sh
sod_metric.py		sod_metric.py
utils.py		utils.py
utils_mllm.py		utils_mllm.py
visulization_n.png		visulization_n.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 [NeurIPS24] ProMaC: Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

🚀 News

💡 Highlight

Quick Start

Download Dataset

Running ProMaC on CHAMELON Dataset with LLaVA1.5

Demo

TO-DO LIST

Citation

💘 Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

lwpyh/ProMaC_code

Folders and files

Latest commit

History

Repository files navigation

🔥 [NeurIPS24] ProMaC: Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

🚀 News

💡 Highlight

Quick Start

Download Dataset

Running ProMaC on CHAMELON Dataset with LLaVA1.5

Demo

TO-DO LIST

Citation

💘 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages