NOTE: This repository is a work in progress. Changes and updates may occur as the project evolves.
Minimal implementation of the paper Training Language Models to Self-Correct via Reinforcement Learning
To set up the environment for this project, follow the step in unsloth
Install the required packages using the requirements.txt
file:
pip install -r requirements.txt
python score_math.py
dataset_relabel.py
was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'
- add eval [ ]
- create SCoRe Trainer class [ ]
- cleanup code [ ]
- run experiments for math [ ]