-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Phi Models #484
Support Phi Models #484
Conversation
Thanks, this looks great! Can you remove the demo? I don't think we need to have a demo for each new model. Otherwise I'm happy to merge this in. |
The phi models use a tokenizer, |
Turns out there was a small bug in the Hugging Face repo for this tokenizer, I've written a PR addressing this that is now merged in. I'll make this ready for merge again soon. |
Initially the
@neelnanda-io This should be ready to merge in after a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - looks good!
Description
Added support for microsoft/phi model series: phi-1, phi-1.5 and phi-2. Phi-2 has comparable performance to 7B SOTA models but with 2.7B parameters. The licence for phi-2 was recently changed to the MIT licence.
I have not included further unit tests, instead I have added a demo notebook:
demos/phi.ipynb
demonstrating agreement with Hugging Face. On the test prompt provided in the notebook, phi-1 and phi-1.5 get 1e-4 tolerance with the HF model, whereas phi-2 only gets 1e-2 tolerance across logits and resid_pre caches. I think this may be due to #385 Rotary Embedding issue impacting LLaMA and Pythia models. Happy to add further tests to this pull request if needed though.Notes
Type of change
Please delete options that are not relevant.
Checklist: