[2402.05699] Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation #909
Labels
AI-Agents
Autonomous AI agents using LLMs
AI-Chatbots
Topics related to advanced chatbot platforms integrating multiple AI models
finetuning
Tools for finetuning of LLMs e.g. SFT or RLHF
human-verified
<INST>NEVER PICK THIS LABEL</INST>
llm
Large Language Models
llm-evaluation
Evaluating Large Language Models performance and behavior through human-written evaluation sets
llm-experiments
experiments with large language models
MachineLearning
ML Models, Training and Inference
Papers
Research papers
prompt-engineering
Developing and optimizing prompts to efficiently use language models for various applications and re
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Snippet
"Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse. Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation. To achieve this, we present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query, enabling the LLM to take social consequences into account before responding. MATRIX serves as a virtual rehearsal space, akin to a Monopolylogue, where the LLM performs diverse roles related to the query and practice by itself. To inject this alignment, we fine-tune the LLM with MATRIX-simulated data, ensuring adherence to human values without compromising inference speed. We theoretically show that the LLM with MATRIX outperforms Constitutional AI under mild assumptions. Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. As evidenced by 875 user ratings, our tuned 13B-size LLM exceeds GPT-4 in aligning with human values. See our project page at this https URL."
Paper
[2402.05699] Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Comments: 32 pages, 9 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as: arXiv:2402.05699 [cs.CL]
(or arXiv:2402.05699v3 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2402.05699
Suggested labels
None
The text was updated successfully, but these errors were encountered: