Patronizing and Condescending Language (PCL) is a type of micro-aggression against vulnerable groups on the Internet. It is a subcategory of toxic speech and is an emerging field since 2022.
PclGPT is a bilingual large language model group (LLM) based on ChatGLM-3 and LLaMA-2, divided into two versions according to the training language: PclGPT-CN (based on ChatGLM) and PclGPT-EN (based on LLaMA). Built upon these foundational models, PclGPT has undergone both pre-training and supervised fine-tuning (SFT) to detect patronizing and condescending language (PCL) and other offensive speech. The maximum supported context length for the model is 4096 tokens.
We evaluated the detection performance of the model suite on two public English datasets for condescension detection (Talkdown, Don't Patronize Me) and one Chinese dataset (CPCL).
The performance metrics are calculated using Macro Precision, Recall, and F1-score.
The results for the detection tasks in the PclGPT-EN English model group are as follows:
LM | Model | DPM | TD | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
PLMs | RoBERTa | 76.3 | 78.7 | 77.4 | 88.4 | 86.7 | 86.5 |
RoBERTa-L | 80.2 | 74.9 | 77.2 | 88.1 | 86.0 | 85.9 | |
M-BERT | 69.2 | 76.0 | 71.8 | 87.6 | 87.4 | 87.4 | |
Base-LLMs | ChatGPT | 50.8 | 52.3 | 46.9 | 59.2 | 58.1 | 56.7 |
GPT-4.0 | 51.5 | 57.5 | 54.3 | 60.8 | 60.3 | 60.5 | |
Claude-3 | 52.3 | 52.5 | 52.3 | 61.6 | 64.1 | 63.2 | |
Base-LLMs | LLama-2-7B | 50.9 | 52.6 | 51.4 | 49.9 | 49.7 | 49.7 |
ChatGLM-3-6B | N/A | N/A | N/A | N/A | N/A | N/A | |
LLMs (Ours) | PclGPT-EN | 80.4 | 81.8 | 81.1 | 89.9 | 89.0 | 88.9 |
The results for the detection tasks in the PclGPT-CN Chinese model group are as follows:
LM | Model | CPCL (CN) | ||
---|---|---|---|---|
P | R | F1 | ||
PLMs | RoBERTa | 61.2 | 61.3 | 61.3 |
RoBERTa-L | 62.5 | 61.6 | 62.0 | |
Chinese-BERT | 66.6 | 71.0 | 67.3 | |
M-BERT | 65.8 | 67.8 | 66.6 | |
Base-LLMs | ChatGPT | 53.1 | 54.2 | 53.6 |
GPT-4.0 | 55.4 | 56.3 | 55.7 | |
Claude-3 | 57.2 | 57.7 | 57.3 | |
Base-LLMs | LLama-2-7B | 45.2 | 47.5 | 46.3 |
ChatGLM-3-6B | 51.9 | 50.2 | 51.0 | |
LLMs (Ours) | PclGPT-CN | 69.1 | 72.0 | 70.2 |
We have released version 1.0 weights of our PclGPT-CN and PclGPT-EN models on Hugging Face.
PclGPT-EN: https://huggingface.co/DUTIR-Wang/PclGPT-EN
PclGPT-CN: https://huggingface.co/DUTIR-Wang/PclGPT-CN
After downloading the weights, use the following code for single-sample inference with PclGPT-EN.
from transformers import LlamaTokenizer, LlamaForCausalLM
# LLaMA and Tokenizer
tokenizer = LlamaTokenizer.from_pretrained("DUTIR-Wang/PclGPT-EN")
model = LlamaForCausalLM.from_pretrained("DUTIR-Wang/PclGPT-EN").half().cuda()
def generate_response():
# Sample
sample_text = "For someone who's just a mere street sweeper, you sure think highly of yourself."
instruction = (
"Suppose you are a linguist and you are asked to judge whether a text is patronizing and condescending. "
"Patronizing and condescending language expresses a sense of superiority or belittles others, making them feel inferior or incapable. "
"# Your return: Based on the following conversation, make a decision and return your choice. "
"FALSE means the text is not a condescending language, and TRUE means the text is a condescending language. "
"-> here is the text: ({})"
).format(sample_text)
inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=1024)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
model_output = extract_option(output_text)
return classify_output(model_output)
def extract_option(output_text):
options = ['TRUE', 'FALSE']
for option in options:
if option in output_text:
return option
return "Could not detect."
def classify_output(model_output):
# return
return "The text is {} a condescending language.".format(
"not" if model_output == "FALSE" else ""
)
response = generate_response()
print(response)
The output will be
"The text is a condescending language."
After downloading the weights, use the following code for single-sample inference with PclGPT-CN.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True)
model = AutoModel.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True).half().cuda()
def generate_response():
# 样例文本
sample_text = "区区扫地的,你可真是厉害坏了"
instruction = (
"假定你是一名语言学家,检测居高临下言论。居高临下言论是优势地位群体针对弱势群体的优越言论,"
"语言攻击性较弱,但往往夹杂着讽刺,刻板印象,会对弱势群体造成伤害。"
"居高临下根据语气程度分为 A级:非居高临下(正常非歧视、非冷暴力言论),"
"B级:轻微居高临下(语气略带鼓励或怜悯),C级:中等居高临下(说话人较为客观陈述,但语气带有歧视),"
"D级:严重居高临下(说话人语气轻蔑,严重歧视弱势群体)。"
"接下来将给你一段文本,根据上述规则,你负责判断该文本属于(A/B/C/D级)的哪一级,并只回答选项。"
"-> 文本:({})"
).format(sample_text)
# Tokenizer 和模型推理
inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=1024)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
model_output = extract_option(output_text)
return classify_output(model_output)
def extract_option(output_text):
options = ['A', 'B', 'C', 'D']
for char in reversed(output_text.strip()):
if char in options:
return char
return "无法识别的输出"
def classify_output(model_output):
# 根据模型输出的选项返回相应的解释
if model_output == "A":
return "判断为A级:非居高临下"
elif model_output == "B":
return "判断为B级:轻微居高临下"
elif model_output == "C":
return "判断为C级:中等居高临下"
elif model_output == "D":
return "判断为D级:严重居高临下"
else:
return "无法识别的输出,请检查输入或模型输出"
response = generate_response()
print(response)
The output will be
"判断为D级:严重居高临下"
Our paper can be accessed here. Paper link: https://arxiv.org/abs/2410.00361
If you plan to apply or extend our work, please cite the following paper.
@misc{wang2024pclgptlargelanguagemodel,
title={PclGPT: A Large Language Model for Patronizing and Condescending Language Detection},
author={Hongbo Wang and Mingda Li and Junyu Lu and Hebin Xia and Liang Yang and Bo Xu and Ruizhu Liu and Hongfei Lin},
year={2024},
eprint={2410.00361},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.00361},
}
The work studied in this paper falls within the subcategory of Toxic Speech. PCL is a form of microaggression, and therefore, part of this research may cause discomfort and sensitivity among users. This research is solely intended for the protection of vulnerable groups and for identifying and managing online verbal attacks. Please do not use the model weights to generate any harmful content.