-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MicroLlama training support #1457
Conversation
It looks like the unit test failed because I did not link hf_config to my hf repo correctly to get the tokenizer.
Let me make an update. |
1. Add microllama.yaml for training configuration 2. Modify config.py to add MicroLlama model config 3. Modify pretrain.py to import and handle MicroLlama data 4. Add microllama.py for data loading during training 5. Fixed a bug in prepare_slimpajama.py by adding is_generator attribute. Data processing with the latest litdata library will crash if this attribute is missing. Data Processing: Same as TinyLlama but you don't need startcoder data. Training: litgpt pretrain --config config_hub/pretrain/microllama.yaml
Awesome, this is great! Really appreciate it! And that's a really clean PR. I will test this either tonight or tomorrow before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks all good to me. There's one minor suggestion: Instead of duplicating most of the code in the MicroLlama data module, what do you think about making a LlamaDataModule
base class with a starcoder toggle from which you can then initialize a TinyLlama
and MicroLlama
subclass. Basically just a minor refactoring to avoid code duplication. Let me know in case you want my help with this.
Thanks for the review and updating the doc! Let me refactor the code today! I can run some basic checks on my side but I will need your help regression test TinyLlama. |
Thanks @keeeeenw for the PR 👍 |
awesome! thanks for the quick update! I will test this today |
Point No. 5 of OP brought me here. Both prepare_slimpajama() and prepare_starcoder() are crashing for lack of is_generator attribute in their DataRecipe classes. |
Yeah, I suspected prepare_starcoder could have the same issue but I don't have the star coder dataset locally to test any fixes. Let me add the same change to prepare_starcoder. Please let me know if it works for you. |
Thanks for contributing the model and training config! |
LGTM, big thanks for this PR! |
Great! Thank you all for the review and thoughtful comments. According to https://lightning.ai/blog/contribute-to-lightning/, you will merge my change, right? Please let me know if there is anything else you need from my side! |
Yes, this is great and ready to merge! In fact, I will merge it now! Thanks again, we really appreciate the effort! |
Hello,
I was previously contacted by your team regarding the addition of MicroLlama support to the latest litgpt codebase. Please review and accept this pull request if you agree with my approach. Otherwise, I welcome any questions and comments.
Additionally, I have not yet tested this change with Lightning AI Studios. I will work on any necessary code adjustments to support the studio and update the documentation as needed.
I made the following changes:
Data Processing:
Same as TinyLlama
Training:
litgpt pretrain --config config_hub/pretrain/microllama.yaml