Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Phi Models #484

Merged
merged 7 commits into from
Jan 28, 2024
Merged

Support Phi Models #484

merged 7 commits into from
Jan 28, 2024

Conversation

cmathw
Copy link
Contributor

@cmathw cmathw commented Jan 19, 2024

Description

Added support for microsoft/phi model series: phi-1, phi-1.5 and phi-2. Phi-2 has comparable performance to 7B SOTA models but with 2.7B parameters. The licence for phi-2 was recently changed to the MIT licence.

I have not included further unit tests, instead I have added a demo notebook: demos/phi.ipynb demonstrating agreement with Hugging Face. On the test prompt provided in the notebook, phi-1 and phi-1.5 get 1e-4 tolerance with the HF model, whereas phi-2 only gets 1e-2 tolerance across logits and resid_pre caches. I think this may be due to #385 Rotary Embedding issue impacting LLaMA and Pythia models. Happy to add further tests to this pull request if needed though.

Notes

  • This model's architecture is very similar to LLaMA but uses tied LayerNorm (ie. same LayerNorm for both attention and MLP blocks) and parallel Attention-MLP (ie. MLP reads from resid_pre) like GPT-J.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@neelnanda-io
Copy link
Collaborator

Thanks, this looks great! Can you remove the demo? I don't think we need to have a demo for each new model. Otherwise I'm happy to merge this in.

@cmathw
Copy link
Contributor Author

cmathw commented Jan 21, 2024

The phi models use a tokenizer, CodeGenTokenizer, that on further inspection doesn't work nicely with existing TL functionality straight away. I'll convert this PR to a draft and address this.

@cmathw cmathw marked this pull request as draft January 21, 2024 01:41
@cmathw cmathw changed the title Support Phi Models [Draft] Support Phi Models Jan 21, 2024
@cmathw
Copy link
Contributor Author

cmathw commented Jan 23, 2024

Turns out there was a small bug in the Hugging Face repo for this tokenizer, I've written a PR addressing this that is now merged in. I'll make this ready for merge again soon.

@cmathw cmathw marked this pull request as ready for review January 24, 2024 01:40
@cmathw cmathw changed the title [Draft] Support Phi Models Support Phi Models Jan 24, 2024
@cmathw
Copy link
Contributor Author

cmathw commented Jan 24, 2024

Initially the CodeGenTokenizer was not working well with existing TL functionality, the main reason for this was a small typo on the Hugging Face repo, fixed here. Since the initial commit I have also:

  • Removed demos/phi.ipynb.
  • The CodeGenTokenizer does not support adding BOS token when loaded in as CodeGenTokenizerFast, to ensure that it is loaded in as CodeGenTokenizer, I've added a flag to HookedTransformer.py to address this. use_fast is a AutoTokenizer kwarg that by default is set to True by HuggingFace, here it is set to False if the model loaded in is one of the Phi models.
  • Get activation function from HF config.

@neelnanda-io This should be ready to merge in after a review!

Copy link
Collaborator

@alan-cooney alan-cooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - looks good!

@alan-cooney alan-cooney merged commit 8a17a76 into TransformerLensOrg:main Jan 28, 2024
8 checks passed
@cmathw cmathw deleted the phi branch January 28, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants