SAP

This is the official repo of the paper "Attack Prompt Generation for Red Teaming and Defending Large Language Models" accepted to Findings of EMNLP 2023.

Environment Requirement

The code runs well under python 3.8.0. The required packages are as follows:

openai == 0.27.4
backoff == 2.2.1
fire == 0.5.0
transformers == 4.28.1
peft == 0.3.0
datasets == 2.11.0
torch == 1.13.1

Data

We put SAP dataset in ./datasets/.

Run the Code

Attack Framework

You can run attack.py to generate your own SAP dataset with Attack Framework. An example command of SAP5 generation can be found in case_generate.sh.

Defense Framework

An example script for a fine-tuning iteration can be found in defense_example.sh.

Acknowledgment

Some parts of this repository are adopted from alpaca-lora, you can find more information in https://github.com/tloen/alpaca-lora. Thanks for the contributions!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
alpaca_lora_utils		alpaca_lora_utils
data		data
datasets		datasets
finetune_data		finetune_data
templates		templates
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alpaca_generate.py		alpaca_generate.py
attack.py		attack.py
case_generate.sh		case_generate.sh
compare_eval_method.py		compare_eval_method.py
defense_example.sh		defense_example.sh
draw_auc.py		draw_auc.py
eval.sh		eval.sh
finetune.py		finetune.py
finetune.sh		finetune.sh
generate.sh		generate.sh
generate_finetune_data.py		generate_finetune_data.py
gpt_eval.py		gpt_eval.py
read_avg_score.py		read_avg_score.py
regenerate.sh		regenerate.sh
regenerate_case.py		regenerate_case.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAP

Environment Requirement

Data

Run the Code

Attack Framework

Defense Framework

Acknowledgment

About

Releases

Packages

Languages

License

Aatrox103/SAP

Folders and files

Latest commit

History

Repository files navigation

SAP

Environment Requirement

Data

Run the Code

Attack Framework

Defense Framework

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages