Skip to content

vsamuel2003/PersonaGym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PersonaGym

Task

This repository contains the dataset and code of the paper (Under Review):

PersonaGym: Evaluating Persona Agents and LLMs

Personas and Static Environments

Our personas used in our experiment are located in the personas file. The current list of static environments is located in the environments file

Setup

# Environment setup
conda create -n PersonaGym python=3.9 -y
conda activate PersonaGym

# install dependencies
pip install -r requirements.txt

Available Models

Currently, our framework supports the evaluation of any model available through the OpenAI, Anthorpic, or TogetherAI APIs.

Evaluation

To start the evaluation of a persona or multiple personas, begin by inputting your OpenAI, Anthropic, and TogetherAI API keys here

OPENAI_API_KEY = "Insert OpenAI key here"
CLAUDE_API_KEY = "Insert Claude key here"
LLAMA_API_KEY = "Insert Llama key here"

Then move to the code directory and run the run.py file. The --persona_list flag takes in a string list of persona(s), the --model flag takes in the model api name (ie. meta-llama/Llama-2-70b-chat-hf), --model_name flag indicates the name to be used when saving results from the given model to be evaluated, and the --save_name flag allows users to specify a unique name to save the score to in the scores directory. Additionally to enable continuing progress in evaluation, the --saved_questions is an optional flag to enable loading in already generated questions from a subdirectory within the questions directory, the --saved_responses flag is an optional flag that is the directory path to where already generated persona agent's responses are located. Finally the --benchmark enables running on our benchmark. Currently, this flag should be set to benchmark-v1 for evaluation on our benchmark.

An example of running the run.py file is included below

python run.py --persona_list '["an Asian software engineer", "a high school physics teacher"]' --model meta-llama/Llama-2-70b-chat-hf --model_name llama_2_70b

An example of evaluating on our benchmark is included below

python run.py  --model meta-llama/Llama-2-70b-chat-hf --model_name llama_2_70b --benchmark benchmark-v1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages