Support Phi Models #484

cmathw · 2024-01-19T11:41:36Z

Description

Added support for microsoft/phi model series: phi-1, phi-1.5 and phi-2. Phi-2 has comparable performance to 7B SOTA models but with 2.7B parameters. The licence for phi-2 was recently changed to the MIT licence.

I have not included further unit tests, instead I have added a demo notebook: demos/phi.ipynb demonstrating agreement with Hugging Face. On the test prompt provided in the notebook, phi-1 and phi-1.5 get 1e-4 tolerance with the HF model, whereas phi-2 only gets 1e-2 tolerance across logits and resid_pre caches. I think this may be due to #385 Rotary Embedding issue impacting LLaMA and Pythia models. Happy to add further tests to this pull request if needed though.

Notes

This model's architecture is very similar to LLaMA but uses tied LayerNorm (ie. same LayerNorm for both attention and MLP blocks) and parallel Attention-MLP (ie. MLP reads from resid_pre) like GPT-J.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

neelnanda-io · 2024-01-19T15:03:45Z

Thanks, this looks great! Can you remove the demo? I don't think we need to have a demo for each new model. Otherwise I'm happy to merge this in.

cmathw · 2024-01-21T01:41:03Z

The phi models use a tokenizer, CodeGenTokenizer, that on further inspection doesn't work nicely with existing TL functionality straight away. I'll convert this PR to a draft and address this.

cmathw · 2024-01-23T17:17:34Z

Turns out there was a small bug in the Hugging Face repo for this tokenizer, I've written a PR addressing this that is now merged in. I'll make this ready for merge again soon.

cmathw · 2024-01-24T01:52:02Z

Initially the CodeGenTokenizer was not working well with existing TL functionality, the main reason for this was a small typo on the Hugging Face repo, fixed here. Since the initial commit I have also:

Removed demos/phi.ipynb.
The CodeGenTokenizer does not support adding BOS token when loaded in as CodeGenTokenizerFast, to ensure that it is loaded in as CodeGenTokenizer, I've added a flag to HookedTransformer.py to address this. use_fast is a AutoTokenizer kwarg that by default is set to True by HuggingFace, here it is set to False if the model loaded in is one of the Phi models.
Get activation function from HF config.

@neelnanda-io This should be ready to merge in after a review!

alan-cooney

Thanks - looks good!

cmathw added 3 commits January 19, 2024 09:33

add support for phi series models

39ae280

remove dummy files

509f0b3

formatting

ee9fb1d

cmathw marked this pull request as draft January 21, 2024 01:41

cmathw changed the title ~~Support Phi Models~~ [Draft] Support Phi Models Jan 21, 2024

cmathw added 3 commits January 24, 2024 01:26

remove demo

d71c245

add use_fast kwarg to AutoTokenizer.from_pretrained

c4d1fb5

fix comment typo

a01a52e

cmathw marked this pull request as ready for review January 24, 2024 01:40

cmathw changed the title ~~[Draft] Support Phi Models~~ Support Phi Models Jan 24, 2024

get act fn from hf cfg

2c072eb

alan-cooney approved these changes Jan 28, 2024

View reviewed changes

alan-cooney merged commit 8a17a76 into TransformerLensOrg:main Jan 28, 2024
8 checks passed

cmathw deleted the phi branch January 28, 2024 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Phi Models #484

Support Phi Models #484

cmathw commented Jan 19, 2024 •

edited

Loading

neelnanda-io commented Jan 19, 2024

cmathw commented Jan 21, 2024

cmathw commented Jan 23, 2024

cmathw commented Jan 24, 2024 •

edited

Loading

alan-cooney left a comment

Support Phi Models #484

Support Phi Models #484

Conversation

cmathw commented Jan 19, 2024 • edited Loading

Description

Notes

Type of change

Checklist:

neelnanda-io commented Jan 19, 2024

cmathw commented Jan 21, 2024

cmathw commented Jan 23, 2024

cmathw commented Jan 24, 2024 • edited Loading

alan-cooney left a comment

Choose a reason for hiding this comment

cmathw commented Jan 19, 2024 •

edited

Loading

cmathw commented Jan 24, 2024 •

edited

Loading