Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lightweight tests for LoRA #8708

Closed
ngxson opened this issue Jul 26, 2024 · 8 comments
Closed

Add lightweight tests for LoRA #8708

ngxson opened this issue Jul 26, 2024 · 8 comments
Labels
enhancement New feature or request stale

Comments

@ngxson
Copy link
Collaborator

ngxson commented Jul 26, 2024

Ref: #8687 (comment)
(cc @ggerganov)

TODO:

  • Train some adapters based on stories15M and stories15M_MOE
  • Test with llama-cli -m base_model.gguf --lora lora_adapter.gguf
  • Test merging using llama-export-lora, then re-run the merged.gguf to verify it outputs the same thing as above

Optionally: make some small stories model with different arch, for example gemma, phi,...

@ngxson ngxson added the enhancement New feature or request label Jul 26, 2024
@ngxson
Copy link
Collaborator Author

ngxson commented Jul 26, 2024

I'll need time to trainings some adapters for testing (maybe I'll extend it to test other architectures than llama), so I created this TODO for tracking

@ltoniazzi
Copy link
Contributor

Hi! I'd be happy to help with this issue, is there anything I start working on? Like writing the test script or training the gguf adapters?

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 5, 2024

Part of this task will be done with #8857 where I will do a simple lora hotswap test.

The training part will be a bit tricky so it would be nice if you can help. What's missing now is to test with other arch like gemma, phi3, etc. This requires doing these steps:

  1. Take the same arch from base model (the HF transformer model)
  2. Replace all tensors with smaller one, initialize them to random
  3. Train model on bed time story dataset (we expect to overfit the model)
  4. Make lora adapter, then overfit the adapter with shakespeare dataset

The goal is to have overfitted models smaller than 50MB in size, which will be useful to be used in CI test.

Because models are overfitted, we expect it to output the same thing everytime. If it's not, then we have problem in the code.

@ltoniazzi
Copy link
Contributor

ltoniazzi commented Aug 6, 2024

Part of this task will be done with #8857 where I will do a simple lora hotswap test.

The training part will be a bit tricky so it would be nice if you can help. What's missing now is to test with other arch like gemma, phi3, etc. This requires doing these steps:

  1. Take the same arch from base model (the HF transformer model)
  2. Replace all tensors with smaller one, initialize them to random
  3. Train model on bed time story dataset (we expect to overfit the model)
  4. Make lora adapter, then overfit the adapter with shakespeare dataset

The goal is to have overfitted models smaller than 50MB in size, which will be useful to be used in CI test.

Because models are overfitted, we expect it to output the same thing everytime. If it's not, then we have problem in the code.

@ngxson I'm almost there with Gemma-2. Namely:

  • Replaced matrices in gemma-2-2b with small ones
  • trained a lora adapter on shakespear.txt
  • converted both the small gemma and it's adapter to gguf
  • run llama-cli with --lora flag.

Only issue is that in the last step the output quality does not seem to be using the adapter (though a first debugging suggests llama-cli is using the adapter successfully). I need a bit more time to figure this out.

In the meantime, how should I organise this code and files?
Should I put the model shrinking conversion scripts in a github repo and upload the gguf files to my huggingface account?

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 6, 2024

Only issue is that in the last step the output quality does not seem to be using the adapter (though a first debugging suggests llama-cli is using the adapter successfully). I need a bit more time to figure this out.

Sounds great, thanks.

Btw I forgot to mention, don't use stock shakespeare.txt because it's big and it's impossible to overfit the whole dataset for small model. You can use just a small part of it, for example: https://huggingface.co/ggml-org/stories15M_MOE/blob/main/data.txt

Also you can take the data parsing code from https://huggingface.co/ggml-org/stories15M_MOE/blob/main/finetune.ipynb

This make the chance to generate words like thy or thou become higher.

In the meantime, how should I organise this code and files?

I don't have a clear idea for now, but I think we can start by add a script under scripts/test_lora.sh, ideally we will do e2e test:

  1. git clone --depth 1 the model from huggingface
  2. The conversion from PEFT --> GGUF
  3. Generation with / without lora adapter via llama-cli (only output the text for now, we can add something like assert later on)

You can put the model on your hf account for now, we will see later if we can move it to ggml-org

@ltoniazzi
Copy link
Contributor

ltoniazzi commented Aug 8, 2024

@ngxson Quick update:
-> made gemma-2 small
-> trained it to learn to reproduce 1 paragraph from shakespeare
-> trained a lora on top to reproduce lyrics from a song. (Training on cpu/f32)

Inference:

  • Running in torch: both base and base+adapter reproduce the respective exact text they are trained on ✅.

  • Running base in llama.cpp: convert base to gguf/f32 -> reproduces something close to the exact text but a little messy ✅/❌

  • Running lora in llama.cpp: convert lora to gguf/f32: then llama-cli -m base.gguf --lora adapter.gguf returns gibberish! ❌

As many things could be going wrong, I wanted to check first that the layer's weights in safetensor are the same in gguf. I am using this code to print weights, but the printed weights are of order e-20, which makes me think something that is not float is being casted to a float.

Question: how do I print out the ggml tensor's weights?

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 8, 2024

Nice, thanks for the info.

Question: how do I print out the ggml tensor's weights?

You can maybe have a look at gguf_dump.py and modify the python code to print tensor data.

Another option is to use transformers to load gguf: https://huggingface.co/docs/transformers/en/gguf

You can then use pytorch to compare tensors

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants