Skip to content
/ SCoRe Public

SCoRe: Training Language Models to Self-Correct via Reinforcement Learning

Notifications You must be signed in to change notification settings

BY571/SCoRe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NOTE: This repository is a work in progress. Changes and updates may occur as the project evolves.

SCoRe: Self-Correct via Reinforcement Learning

Minimal implementation of the paper Training Language Models to Self-Correct via Reinforcement Learning

Environment Setup

1. Create and Activate Conda Environment

To set up the environment for this project, follow the step in unsloth

2. Install Dependencies

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

Run SCoRe on Math Probelm

python score_math.py

dataset_relabel.py was used to add final answer pattern: 'Final Answer: The final answer is $answer$. I hope it is correct.'

TODOs:

  • add eval [ ]
  • create SCoRe Trainer class [ ]
  • cleanup code [ ]
  • run experiments for math [ ]

About

SCoRe: Training Language Models to Self-Correct via Reinforcement Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages