SCoRe: Self-Correct via Reinforcement Learning

NOTE: This repository is a work in progress. Changes and updates may occur as the project evolves.

Environment Setup

To set up the environment for this project, follow the step in unsloth

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

python score_math.py

dataset_relabel.py was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
README.md		README.md
dataset_relabel.py		dataset_relabel.py
requirements.txt		requirements.txt
score_math.py		score_math.py
string_matcher.py		string_matcher.py