Add Gemma checkpoint support #941

rasbt · 2024-02-21T15:56:26Z

Adds the new Gemma models by Google.

Fixes #940

rasbt · 2024-02-21T15:59:38Z

The 2B model loads ok, the 7B doesn't yet. There seems to be a size mismatch.
The 2B model does produce garbage outputs in generate.py though; perhaps due to the missing layernorm?
I am using GeLU for now as GeGLU (the version I am using) also causes size issues, and then there's the fact that HF and Keras use GeLU according to our discussion in Support Gemma #940
Is the multi-query attention setting correct? In the conig.json, they set "num_key_value_heads" equal to the number of heads, which is n_query_groups=1 or n_query_groups=n_heads?

lit_gpt/model.py

lit_gpt/config.py

rasbt · 2024-02-21T16:26:02Z

Btw @carmocca or @Andrei-Aksionov please feel free to continue this PR if you have time and are interested. I may have to put in a stop for now due to another project that is due soon ... I thought it was more of a simpler port and thought that's something I could do in ~1 h but it appears I got a bit stuck here and have to put in a break for now.

Andrei-Aksionov · 2024-02-21T16:31:25Z

I cannot make changes to others PRs, so I guess Carlos is the only one :).

thought that's something I could do in ~1 h

So many of us have been in this situation before ...

rasbt · 2024-02-21T22:29:03Z

Thanks for the updates so far and getting the 2B to work!

scripts/convert_hf_checkpoint.py

lit_gpt/model.py

scripts/convert_hf_checkpoint.py

Andrei-Aksionov · 2024-02-22T19:59:08Z

@rasbt Do we really need all of these steps?

- Update finetuning docs
- Test pretraining
- Test finetuning
   - Full finetuning
   - LoRA
   - Adapter

I did a sanity check with LoRA fine-tuning and had no errors. And there couldn't be any errors, since we didn't change anything related to training. Plus we have tests that check if LoRA and Adapter are appliable to the model.
Fine-tuning docs also don't care about a new model.

lit_gpt/rmsnorm.py

carmocca

Excellent job!

lit_gpt/adapter_v2.py

lit_gpt/config.py

lit_gpt/rmsnorm.py

scripts/convert_hf_checkpoint.py

tests/test_convert_lit_checkpoint.py

.github/workflows/cpu-tests.yml

lit_gpt/adapter_v2.py

lit_gpt/lora.py

lit_gpt/model.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Andrei-Aksionov <aksionau.andrei@gmail.com>

gemma

034afac

rasbt requested review from awaelchli, carmocca and lantiga as code owners February 21, 2024 15:56

carmocca reviewed Feb 21, 2024

View reviewed changes

lit_gpt/model.py Outdated Show resolved Hide resolved

lit_gpt/config.py Show resolved Hide resolved

add docs

d26ab02

rasbt and others added 4 commits February 21, 2024 16:31

update query head config

5c1b029

apply keras geglu workaround

d77bd4a

Carlos

5ae2ad6

An unfinished, but working 2b variant.

e04be8a

rasbt added the model-weights label Feb 21, 2024

carmocca reviewed Feb 21, 2024

View reviewed changes

scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved

carmocca reviewed Feb 21, 2024

View reviewed changes

lit_gpt/model.py Outdated Show resolved Hide resolved

carmocca reviewed Feb 21, 2024

View reviewed changes

lit_gpt/model.py Outdated Show resolved Hide resolved

Andrei-Aksionov and others added 11 commits February 22, 2024 11:36

Gemma-7b now works.

e61c449

add instruction-finetuned version

9a1f23c

A test for config to check head_size

3ec329f

Update Gemma config

29a00c2

Adapter_v2 and LoRA: attn.proj size is head_size * num_heads

bf9d711

Adapter_v2 and LoRA: gemmamlp class

072c9f6

RMSNorm: unit offset is configurable

bed71f1

Configurable wte output scaling

ec7d01e

Update tests to supports changes in Config class

bd0864c

Test for Gemma

bf8c9b5

conver_hf: reuse llama copy function

c78bd6e

Andrei-Aksionov reviewed Feb 22, 2024

View reviewed changes

scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved

Andrei-Aksionov added 3 commits February 22, 2024 19:34

Update convert_lit + test

d98df3b

Merge branch 'main' into gemma

159df75

Restore accidently deleted comment line

628c7bc

Andrei-Aksionov reviewed Feb 22, 2024

View reviewed changes

lit_gpt/rmsnorm.py Outdated Show resolved Hide resolved

Prompt for Gemma it (instruct models)

1a2f9f8

Andrei-Aksionov changed the title ~~[WIP] Add Gemma checkpoint support~~ Add Gemma checkpoint support Feb 22, 2024

Andrei-Aksionov requested a review from carmocca February 22, 2024 20:39

Andrei-Aksionov added 2 commits February 23, 2024 13:30

RMSNorm: reduce computations when self.add_unit_offset is False

d002695

Auto markdown formatting

c915d57

carmocca reviewed Feb 23, 2024

View reviewed changes

Andrei-Aksionov added 9 commits February 23, 2024 20:18

Drop tie_weights in convert_hf

32260e6

Comment explaining why head_size*num_head instead of n_embd

78ad643

scale_wte_output --> scale_embeddings

6f154ab

Config: drop self.rmsnorm_add_unit_offset

cfe68bb

Comment why do we need a unit offset in RMSNorm

57db710

Bump up min version of transformers in github CI

4c44085

Merge branch 'main' into gemma

c1dc9d2

Merge branch 'main' into gemma

0fe9b3f

Update convert_hf test

e9b0c5a

carmocca approved these changes Feb 23, 2024

View reviewed changes

.github/workflows/cpu-tests.yml Show resolved Hide resolved

lit_gpt/adapter_v2.py Outdated Show resolved Hide resolved

lit_gpt/lora.py Outdated Show resolved Hide resolved

lit_gpt/model.py Outdated Show resolved Hide resolved

Andrei-Aksionov and others added 4 commits February 23, 2024 21:51

Update lit_gpt/adapter_v2.py

d17bb34

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Update lit_gpt/lora.py

86b5b7a

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Update lit_gpt/model.py

8854d14

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Bump up min transformers version in Azure workflow

d219965

carmocca enabled auto-merge (squash) February 23, 2024 18:52

carmocca merged commit 7c15749 into main Feb 23, 2024
8 checks passed

carmocca deleted the gemma branch February 23, 2024 19:12

carmocca mentioned this pull request Mar 4, 2024

Update gemma to use tanh approximation #999

Closed

rasbt added a commit that referenced this pull request Mar 18, 2024

Add Gemma checkpoint support (#941)

ea3e604

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Andrei-Aksionov <aksionau.andrei@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma checkpoint support #941

Add Gemma checkpoint support #941

rasbt commented Feb 21, 2024 •

edited by Andrei-Aksionov

Loading

rasbt commented Feb 21, 2024

rasbt commented Feb 21, 2024

Andrei-Aksionov commented Feb 21, 2024

rasbt commented Feb 21, 2024

Andrei-Aksionov commented Feb 22, 2024 •

edited

Loading

carmocca left a comment

Add Gemma checkpoint support #941

Add Gemma checkpoint support #941

Conversation

rasbt commented Feb 21, 2024 • edited by Andrei-Aksionov Loading

rasbt commented Feb 21, 2024

rasbt commented Feb 21, 2024

Andrei-Aksionov commented Feb 21, 2024

rasbt commented Feb 21, 2024

Andrei-Aksionov commented Feb 22, 2024 • edited Loading

carmocca left a comment

Choose a reason for hiding this comment

rasbt commented Feb 21, 2024 •

edited by Andrei-Aksionov

Loading

Andrei-Aksionov commented Feb 22, 2024 •

edited

Loading