GitHub

SocREval

This is the code repository for the Arxiv paper SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation

Installing dependencies

Use virtual environment tools (e.g miniconda) to install packages and run experiments
python==3.7.10
pip install -r requirements.txt

Code organization

The code is organized as follows:

Data processing
- roscoe_data_processing.py (processing human judged datasets in ROSCOE for our experiments)
GPT-4 for reference-free reasoning evaluation
- gpt4_evaluation_gsm8k.py (GPT-4 on GSM8K)
- gpt4_evaluation_esnli.py (GPT-4 on e-SNLI)
- gpt4_evaluation_drop.py (GPT-4 on DROP)
- gpt4_evaluation_cosmos.py (GPT-4 on Cosmos QA)
SocREval for reference-free reasoning evaluation
- SocREval_gsm8k.py (SocREval on GSM8K)
- SocREval_esnli.py (SocREval on e-SNLI)
- SocREval_drop.py (SocREval on DROP)
- SocREval_cosmos.py (SocREval on Cosmos QA)

Change the working path

Change the /path/to/working/dir to the path to your working directory.

Export OPENAI API KEY

You need to export your own OpenAI API key before running experiments with OpenAI API, i.e., export OPENAI_API_KEY=$YOUR_OPENAI_API_KEY

Data preparation

Following the instructions in ROSCOE code repository:

Run download_annotated.sh to obtain the human judged datasets, including "roscoe/raw", "roscoe/generated", and "roscoe/annotated", and put them all under /path/to/working/dir/
Run restore_annotated.py to restore the annotated files and put them under /path/to/working/dir/roscoe/restore_annotated

Reproducing experiments

Processing the data for our experiments:

python roscoe_data_processing.py

To reproduce the experiments for GPT-4 evaluation:

python gpt4_evaluation_gsm8k.py
python gpt4_evaluation_esnli.py
python gpt4_evaluation_drop.py
python gpt4_evaluation_cosmos.py

To reproduce the experiments for SocREval:

python SocREval_gsm8k.py
python SocREval_esnli.py
python SocREval_drop.py
python SocREval_cosmos.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SocREval

Installing dependencies

Code organization

Change the working path

Export OPENAI API KEY

Data preparation

Reproducing experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
.DS_Store		.DS_Store
README.md		README.md
SocREval_cosmos.py		SocREval_cosmos.py
SocREval_drop.py		SocREval_drop.py
SocREval_esnli.py		SocREval_esnli.py
SocREval_gsm8k.py		SocREval_gsm8k.py
gpt4_evaluation_cosmos.py		gpt4_evaluation_cosmos.py
gpt4_evaluation_drop.py		gpt4_evaluation_drop.py
gpt4_evaluation_esnli.py		gpt4_evaluation_esnli.py
gpt4_evaluation_gsm8k.py		gpt4_evaluation_gsm8k.py
requirements.txt		requirements.txt
roscoe_data_processing.py		roscoe_data_processing.py

HornHehhf/SocREval

Folders and files

Latest commit

History

Repository files navigation

SocREval

Installing dependencies

Code organization

Change the working path

Export OPENAI API KEY

Data preparation

Reproducing experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages