Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 925 Bytes

README.md

File metadata and controls

36 lines (21 loc) · 925 Bytes

NOTE: This repository is a work in progress. Changes and updates may occur as the project evolves.

SCoRe: Self-Correct via Reinforcement Learning

Minimal implementation of the paper Training Language Models to Self-Correct via Reinforcement Learning

Environment Setup

1. Create and Activate Conda Environment

To set up the environment for this project, follow the step in unsloth

2. Install Dependencies

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

Run SCoRe on Math Probelm

python score_math.py

dataset_relabel.py was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'

TODOs:

  • add eval [ ]
  • create SCoRe Trainer class [ ]
  • cleanup code [ ]
  • run experiments for math [ ]