Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gemma checkpoint support #941

Merged
merged 37 commits into from
Feb 23, 2024
Merged

Add Gemma checkpoint support #941

merged 37 commits into from
Feb 23, 2024

Conversation

rasbt
Copy link
Contributor

@rasbt rasbt commented Feb 21, 2024

Adds the new Gemma models by Google.

  • Implement model download
  • Test tokenizer
  • Implement HF checkpoint conversion
    • 2B model
    • 7B model
  • Make sure generate.py produces reasonable outputs
    • 2B model
    • 7B model
  • Add chat template for the -it (aka instruct) versions
  • Update download docs
  • Update finetuning docs
  • Test pretraining
  • Test finetuning
    • Full finetuning
    • LoRA
    • Adapter
  • Add unit tests
  • Update README

Fixes #940

@rasbt
Copy link
Contributor Author

rasbt commented Feb 21, 2024

  1. The 2B model loads ok, the 7B doesn't yet. There seems to be a size mismatch.
  2. The 2B model does produce garbage outputs in generate.py though; perhaps due to the missing layernorm?
  3. I am using GeLU for now as GeGLU (the version I am using) also causes size issues, and then there's the fact that HF and Keras use GeLU according to our discussion in Support Gemma #940
  4. Is the multi-query attention setting correct? In the conig.json, they set "num_key_value_heads" equal to the number of heads, which is n_query_groups=1 or n_query_groups=n_heads?

lit_gpt/model.py Outdated Show resolved Hide resolved
lit_gpt/config.py Show resolved Hide resolved
@rasbt
Copy link
Contributor Author

rasbt commented Feb 21, 2024

Btw @carmocca or @Andrei-Aksionov please feel free to continue this PR if you have time and are interested. I may have to put in a stop for now due to another project that is due soon ... I thought it was more of a simpler port and thought that's something I could do in ~1 h but it appears I got a bit stuck here and have to put in a break for now.

@Andrei-Aksionov
Copy link
Contributor

I cannot make changes to others PRs, so I guess Carlos is the only one :).

thought that's something I could do in ~1 h

So many of us have been in this situation before ...

@rasbt
Copy link
Contributor Author

rasbt commented Feb 21, 2024

Thanks for the updates so far and getting the 2B to work!

lit_gpt/model.py Outdated Show resolved Hide resolved
lit_gpt/model.py Outdated Show resolved Hide resolved
@Andrei-Aksionov
Copy link
Contributor

Andrei-Aksionov commented Feb 22, 2024

@rasbt Do we really need all of these steps?

- Update finetuning docs
- Test pretraining
- Test finetuning
   - Full finetuning
   - LoRA
   - Adapter

I did a sanity check with LoRA fine-tuning and had no errors. And there couldn't be any errors, since we didn't change anything related to training. Plus we have tests that check if LoRA and Adapter are appliable to the model.
Fine-tuning docs also don't care about a new model.

lit_gpt/rmsnorm.py Outdated Show resolved Hide resolved
@Andrei-Aksionov Andrei-Aksionov changed the title [WIP] Add Gemma checkpoint support Add Gemma checkpoint support Feb 22, 2024
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job!

lit_gpt/adapter_v2.py Show resolved Hide resolved
lit_gpt/config.py Outdated Show resolved Hide resolved
lit_gpt/config.py Outdated Show resolved Hide resolved
lit_gpt/rmsnorm.py Outdated Show resolved Hide resolved
scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved
tests/test_convert_lit_checkpoint.py Show resolved Hide resolved
.github/workflows/cpu-tests.yml Show resolved Hide resolved
lit_gpt/adapter_v2.py Outdated Show resolved Hide resolved
lit_gpt/lora.py Outdated Show resolved Hide resolved
lit_gpt/model.py Outdated Show resolved Hide resolved
Andrei-Aksionov and others added 4 commits February 23, 2024 21:51
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
@carmocca carmocca enabled auto-merge (squash) February 23, 2024 18:52
@carmocca carmocca merged commit 7c15749 into main Feb 23, 2024
8 checks passed
@carmocca carmocca deleted the gemma branch February 23, 2024 19:12
rasbt added a commit that referenced this pull request Mar 18, 2024
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Andrei-Aksionov <aksionau.andrei@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Gemma
3 participants