A General Framework to Enhance Fine-tuning-based LLM Unlearning

Figure: GRUN consists of two plug-and-play components explicitly for distinguishing and suppression: a soft gate function to distinguish target data, and a ReFT module to suppress target-data-related generation.

Official code for A General Framework to Enhance Fine-tuning-based LLM Unlearning

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations—essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

Necessary dependencies

environment.yml

We modified the files to accommodate the output of gate: ./pyvene_custom/models/intervenable_base.py ./pyvene_custom/models/interventions.py. You can also install the lastest pyvene and change the code highlighted by comment "GRUN", which is very short.

How to use

1. TOFU

Finetune vanilla llama3.1 and mistral to learn TOFU

Detailed CMD of finetuning can be found in cmd_finetune.sh

Baseline: GD, IDK, NPO, and their loRA

Detailed CMD of baseline can be found in cmd_baseline.sh

GRUN

Detailed CMD of GRUN can be found in cmd_grun.sh

Evaluation

Detailed CMD of evaluating baseline and GRUN can be found in cmd_eval.sh
Baseline or GRUN is set by BASE_OR_REFT.

2. WMDP

Detailed CMD of GRUN and evaluation can be found in cmd_wmdp.sh

Baseline is unlearned using original repo with the following cmd

python3 -m rmu.unlearn --model_name meta-llama/Llama-3.1-8B --batch_size 4 --layer_ids 13,14,15 --layer_id 15 --max_num_batches 150 --layer_ids 13,14,15 --layer_id 15 --retain_corpora wikitext,wikitext --forget_corpora bio-forget-corpus,cyber-forget-corpus --steering_coeffs 30,30 --alpha 350,350 --lr 5e-5 --seed 42 --output_dir models/llama3_rmu_param18

python3 -m rmu.unlearn --model_name mistralai/Mistral-7B-v0.1 --batch_size 4 --layer_ids 13,14,15 --layer_id 15 --max_num_batches 150 --retain_corpora wikitext,wikitext --forget_corpora bio-forget-corpus,cyber-forget-corpus --steering_coeffs 27,27 --alpha 1600,1600 --min_len 200 --lr 5e-5 --seed 42 --output_dir models/mistral_rmu_param19

Cite

@article{ren2025general,
  title={A General Framework to Enhance Fine-tuning-based LLM Unlearning},
  author={Ren, Jie and Dai, Zhenwei and Tang, Xianfeng and Liu, Hui and Zeng, Jingying and Li, Zhen and Goutam, Rahul and Wang, Suhang and Xing, Yue and He, Qi},
  journal={arXiv preprint arXiv:2502.17823},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
TOFU_data		TOFU_data
baseline		baseline
collect_results		collect_results
config		config
evals		evals
figure		figure
pyreft		pyreft
pyvene_custom		pyvene_custom
wmdp_data		wmdp_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cmd_baseline.sh		cmd_baseline.sh
cmd_eval.sh		cmd_eval.sh
cmd_finetune.sh		cmd_finetune.sh
cmd_grun.sh		cmd_grun.sh
cmd_wmdp.sh		cmd_wmdp.sh
environment.yml		environment.yml
grun.py		grun.py
grun_eval.py		grun_eval.py
grun_eval_tool.sh		grun_eval_tool.sh
grun_wdmp_eval_mcq.py		grun_wdmp_eval_mcq.py
grun_wmdp.py		grun_wmdp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Necessary dependencies

How to use

1. TOFU

Finetune vanilla llama3.1 and mistral to learn TOFU

Baseline: GD, IDK, NPO, and their loRA

GRUN

Evaluation

2. WMDP

Cite

About

Releases

Packages

Languages

License

renjie3/GRUN

Folders and files

Latest commit

History

Repository files navigation

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Necessary dependencies

How to use

1. TOFU

Finetune vanilla llama3.1 and mistral to learn TOFU

Baseline: GD, IDK, NPO, and their loRA

GRUN

Evaluation

2. WMDP

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages