-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic training script for LLaMA #7
Conversation
As I'm working at addressing Carlos' comments and make improvements, I'll also remove the GPL implementation. |
@lantiga Updated! I ran the two side by side and get these loss values: Nano:
iter 0: loss 10.3979, time: 11945.28ms
iter 1: loss 10.5749, time: 5160.31ms
iter 2: loss 8.6790, time: 4872.23ms
iter 3: loss 6.8088, time: 5174.96ms
iter 4: loss 6.8616, time: 5044.30ms
Old:
iter 0: loss 10.3839, time: 13507.87ms
iter 1: loss 10.9712, time: 5206.71ms
iter 2: loss 8.3166, time: 4857.47ms
iter 3: loss 6.7234, time: 5104.52ms
iter 4: loss 6.8219, time: 5036.32ms There is a small difference where I'm not sure where it is coming from yet. |
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Merging
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Adds a basic training script for LLaMA 7B on the shakespeare dataset.