PersonaGym

This repository contains the dataset and code of the paper (Under Review):

PersonaGym: Evaluating Persona Agents and LLMs

Personas and Static Environments

Our personas used in our experiment are located in the personas file. The current list of static environments is located in the environments file

Setup

# Environment setup
conda create -n PersonaGym python=3.9 -y
conda activate PersonaGym

# install dependencies
pip install -r requirements.txt

Available Models

Currently, our framework supports the evaluation of any model available through the OpenAI, Anthorpic, or TogetherAI APIs.

Evaluation

To start the evaluation of a persona or multiple personas, begin by inputting your OpenAI, Anthropic, and TogetherAI API keys here

OPENAI_API_KEY = "Insert OpenAI key here"
CLAUDE_API_KEY = "Insert Claude key here"
LLAMA_API_KEY = "Insert Llama key here"

Then move to the code directory and run the run.py file. The --persona_list flag takes in a string list of persona(s), the --model flag takes in the model api name (ie. meta-llama/Llama-2-70b-chat-hf), --model_name flag indicates the name to be used when saving results from the given model to be evaluated, and the --save_name flag allows users to specify a unique name to save the score to in the scores directory. Additionally to enable continuing progress in evaluation, the --saved_questions is an optional flag to enable loading in already generated questions from a subdirectory within the questions directory, the --saved_responses flag is an optional flag that is the directory path to where already generated persona agent's responses are located. Finally the --benchmark enables running on our benchmark. Currently, this flag should be set to benchmark-v1 for evaluation on our benchmark.

An example of running the run.py file is included below

python run.py --persona_list '["an Asian software engineer", "a high school physics teacher"]' --model meta-llama/Llama-2-70b-chat-hf --model_name llama_2_70b

An example of evaluating on our benchmark is included below

python run.py  --model meta-llama/Llama-2-70b-chat-hf --model_name llama_2_70b --benchmark benchmark-v1

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
code		code
evaluations		evaluations
prompts		prompts
questions/benchmark-v1		questions/benchmark-v1
rubrics		rubrics
README.md		README.md
pipeline.jpg		pipeline.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PersonaGym

Personas and Static Environments

Setup

Available Models

Evaluation

About

Releases

Packages

Languages

vsamuel2003/PersonaGym

Folders and files

Latest commit

History

Repository files navigation

PersonaGym

Personas and Static Environments

Setup

Available Models

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages