Title | Author | Publish | Year | Link | Source Code |
---|---|---|---|---|---|
Visual Adversarial Examples Jailbreak Large Language Models | Xiangyu Qi | AAAI | 2024 | AAAI | github |
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks | Zonghao Ying | arxiv | 2024 | https://arxiv.org/abs/2406.06302 | https://github.com/ny1024/jailbreak_gpt4o |
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt | Zonghao Ying | arxiv | 2024 | https://arxiv.org/abs/2406.04031 | https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt |
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models | Erfan Shayegani | ICLR | 2024 | https://openreview.net/forum?id=plmBsXHxgR | null |
FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts | Yichen Gong | arxiv | 2023 | https://arxiv.org/abs/2311.05608 | https://github.com/ThuCCSLab/FigStep |
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | Yifan Li | arxiv | 2024 | https://arxiv.org/abs/2403.09792 | https://github.com/AoiDragon/HADES |
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? | Shuo Chen | arxiv | 2024 | https://arxiv.org/abs/2404.03411 | null |
Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | Xijie Huang | arxiv | 2024 | https://arxiv.org/abs/2405.20775 | https://github.com/dirtycomputer/O2M_attack |
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character | Siyuan Ma | arxiv | 2024 | https://arxiv.org/abs/2405.20773 | null |
Title | Author | Publish | Year | Link | Source Code |
---|---|---|---|---|---|
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models | Tianle Gu | arxiv | 2024 | https://arxiv.org/abs/2406.07594 | https://github.com/Carol-gutianle/MLLMGuard |
Cross-Modal Safety Alignment: Is textual unlearning all you need? | Trishna Chakraborty | arxiv | 2024 | https://arxiv.org/abs/2406.02575 | null |
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security | Yihe Fan | arxiv | 2024 | https://arxiv.org/abs/2404.05264 | null |
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | Yu Wang | arxiv | 2024 | https://arxiv.org/abs/2403.09513 | https://github.com/rain305f/AdaShield |
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | Yongshuo Zong | arxiv | 2024 | https://arxiv.org/abs/2402.02207 | https://github.com/ys-zong/VLGuard |
JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks | Xiaoyu Zhang | arxiv | 2024 | https://arxiv.org/abs/2312.10766 | null |
Title | Author | Publish | Year | Link | Source Code |
---|---|---|---|---|---|
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks | Weidi Luo | arxiv | 2024 | https://arxiv.org/abs/2404.03027 | https://github.com/EddyLuo1232/JailBreakV_28K |
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | Xin Liu | arxiv | 2023 | https://arxiv.org/abs/2311.17600 | https://github.com/isXinLiu/MM-SafetyBench |