Skip to content

Latest commit

 

History

History
34 lines (28 loc) · 5.18 KB

mllm.md

File metadata and controls

34 lines (28 loc) · 5.18 KB

Safety

Jailbreak

Attack

Title Author Publish Year Link Source Code
Visual Adversarial Examples Jailbreak Large Language Models Xiangyu Qi AAAI 2024 AAAI github
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks Zonghao Ying arxiv 2024 https://arxiv.org/abs/2406.06302 https://github.com/ny1024/jailbreak_gpt4o
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt Zonghao Ying arxiv 2024 https://arxiv.org/abs/2406.04031 https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Erfan Shayegani ICLR 2024 https://openreview.net/forum?id=plmBsXHxgR null
FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts Yichen Gong arxiv 2023 https://arxiv.org/abs/2311.05608 https://github.com/ThuCCSLab/FigStep
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models Yifan Li arxiv 2024 https://arxiv.org/abs/2403.09792 https://github.com/AoiDragon/HADES
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? Shuo Chen arxiv 2024 https://arxiv.org/abs/2404.03411 null
Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models Xijie Huang arxiv 2024 https://arxiv.org/abs/2405.20775 https://github.com/dirtycomputer/O2M_attack
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character Siyuan Ma arxiv 2024 https://arxiv.org/abs/2405.20773 null

Defense

Title Author Publish Year Link Source Code
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models Tianle Gu arxiv 2024 https://arxiv.org/abs/2406.07594 https://github.com/Carol-gutianle/MLLMGuard
Cross-Modal Safety Alignment: Is textual unlearning all you need? Trishna Chakraborty arxiv 2024 https://arxiv.org/abs/2406.02575 null
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security Yihe Fan arxiv 2024 https://arxiv.org/abs/2404.05264 null
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting Yu Wang arxiv 2024 https://arxiv.org/abs/2403.09513 https://github.com/rain305f/AdaShield
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models Yongshuo Zong arxiv 2024 https://arxiv.org/abs/2402.02207 https://github.com/ys-zong/VLGuard
JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks Xiaoyu Zhang arxiv 2024 https://arxiv.org/abs/2312.10766 null

Benchmark

Title Author Publish Year Link Source Code
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks Weidi Luo arxiv 2024 https://arxiv.org/abs/2404.03027 https://github.com/EddyLuo1232/JailBreakV_28K
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models Xin Liu arxiv 2023 https://arxiv.org/abs/2311.17600 https://github.com/isXinLiu/MM-SafetyBench