TransferAttack

This is the codebase for our paper Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints.

We introduce a novel conceptual framework to elucidate transferability and identify superfluous constraints—specifically, the response pattern constraint and the token tail constraint—as significant barriers to improved transferability. Our method, Guided Jailbreaking Optimization, increases the overall Transfer Attack Success Rate (T-ASR) across a set of target models with varying safety levels from 18.4% to 50.3%, while also improving the stability and controllability of jailbreak behaviors on both source and target models. Please refer to our paper for more details.

Quick Start

Setup

We require AISafetyLab for quick evaluation and the latest FastChat for searching on Llama3. Follow these steps to set up the required dependencies:

# Clone and install AISafetyLab
git clone git@github.com:thu-coai/AISafetyLab.git
cd AISafetyLab
pip install -e .

# Clone and install FastChat
git clone git@github.com:lm-sys/FastChat.git
cd FastChat
pip install -e .

Then, install the remaining dependencies:

git clone git@github.com:thu-coai/TransferAttack.git
cd TransferAttack
pip install -e .

Searching code

Data Construction

We provide a script to construct search data for GCG-Adaptive and our method:

cd scripts
python construct_data.py

Running the Search Script

We build upon the GCG attack framework and integrate our method. Use the following commands to run the search:

bash run_gjo.sh llama2
bash run_gjo.sh llama3

Remember to change the running config in ./scripts/configs

Post Processing

We provide scripts to extract adversarial prompts from log files and combine them with test questions and model chat templates:

python log_file_convert.py
bash gcg_combine.py

Generation

After searching for the adversarial attack prompt, we provide a script supporting batch generation to get the response of the model. Remember to change the model/tokenizer path and the input/output path.

cd gen_code
bash generate.sh

Evaluation

We support evaluation using AISafetyLab. Run the evaluation script as follows:

cd evaluation
python harmbench_eval.py

Citation

@misc{yang2025guidingforcingenhancingtransferability,
      title={Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints}, 
      author={Junxiao Yang and Zhexin Zhang and Shiyao Cui and Hongning Wang and Minlie Huang},
      year={2025},
      eprint={2503.01865},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.01865}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
evaluation		evaluation
gen_code		gen_code
imgs		imgs
llm_attacks		llm_attacks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransferAttack

Quick Start

Setup

Searching code

Data Construction

Running the Search Script

Post Processing

Generation

Evaluation

Citation

About

Releases

Packages

Languages

thu-coai/TransferAttack

Folders and files

Latest commit

History

Repository files navigation

TransferAttack

Quick Start

Setup

Searching code

Data Construction

Running the Search Script

Post Processing

Generation

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages