NOTE: This repository is a work in progress. Changes and updates may occur as the project evolves.

SCoRe: Self-Correct via Reinforcement Learning

Environment Setup

To set up the environment for this project, follow the step in unsloth

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

python score_math.py

dataset_relabel.py was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'