Add lightweight tests for LoRA #8708

ngxson · 2024-07-26T09:27:20Z

TODO:

Train some adapters based on stories15M and stories15M_MOE
Test with llama-cli -m base_model.gguf --lora lora_adapter.gguf
Test merging using llama-export-lora, then re-run the merged.gguf to verify it outputs the same thing as above

Optionally: make some small stories model with different arch, for example gemma, phi,...

The text was updated successfully, but these errors were encountered:

ngxson · 2024-07-26T09:28:41Z

I'll need time to trainings some adapters for testing (maybe I'll extend it to test other architectures than llama), so I created this TODO for tracking

ltoniazzi · 2024-08-05T18:10:59Z

Hi! I'd be happy to help with this issue, is there anything I start working on? Like writing the test script or training the gguf adapters?

ngxson · 2024-08-05T20:31:52Z

Part of this task will be done with #8857 where I will do a simple lora hotswap test.

The training part will be a bit tricky so it would be nice if you can help. What's missing now is to test with other arch like gemma, phi3, etc. This requires doing these steps:

Take the same arch from base model (the HF transformer model)
Replace all tensors with smaller one, initialize them to random
Train model on bed time story dataset (we expect to overfit the model)
Make lora adapter, then overfit the adapter with shakespeare dataset

The goal is to have overfitted models smaller than 50MB in size, which will be useful to be used in CI test.

Because models are overfitted, we expect it to output the same thing everytime. If it's not, then we have problem in the code.

ltoniazzi · 2024-08-06T14:50:46Z

Part of this task will be done with #8857 where I will do a simple lora hotswap test.

The training part will be a bit tricky so it would be nice if you can help. What's missing now is to test with other arch like gemma, phi3, etc. This requires doing these steps:

Take the same arch from base model (the HF transformer model)

Replace all tensors with smaller one, initialize them to random

Train model on bed time story dataset (we expect to overfit the model)

Make lora adapter, then overfit the adapter with shakespeare dataset

The goal is to have overfitted models smaller than 50MB in size, which will be useful to be used in CI test.

Because models are overfitted, we expect it to output the same thing everytime. If it's not, then we have problem in the code.

@ngxson I'm almost there with Gemma-2. Namely:

Replaced matrices in gemma-2-2b with small ones
trained a lora adapter on shakespear.txt
converted both the small gemma and it's adapter to gguf
run llama-cli with --lora flag.

Only issue is that in the last step the output quality does not seem to be using the adapter (though a first debugging suggests llama-cli is using the adapter successfully). I need a bit more time to figure this out.

In the meantime, how should I organise this code and files?
Should I put the model shrinking conversion scripts in a github repo and upload the gguf files to my huggingface account?

ngxson · 2024-08-06T15:11:46Z

Only issue is that in the last step the output quality does not seem to be using the adapter (though a first debugging suggests llama-cli is using the adapter successfully). I need a bit more time to figure this out.

Sounds great, thanks.

Btw I forgot to mention, don't use stock shakespeare.txt because it's big and it's impossible to overfit the whole dataset for small model. You can use just a small part of it, for example: https://huggingface.co/ggml-org/stories15M_MOE/blob/main/data.txt

Also you can take the data parsing code from https://huggingface.co/ggml-org/stories15M_MOE/blob/main/finetune.ipynb

This make the chance to generate words like thy or thou become higher.

In the meantime, how should I organise this code and files?

I don't have a clear idea for now, but I think we can start by add a script under scripts/test_lora.sh, ideally we will do e2e test:

git clone --depth 1 the model from huggingface
The conversion from PEFT --> GGUF
Generation with / without lora adapter via llama-cli (only output the text for now, we can add something like assert later on)

You can put the model on your hf account for now, we will see later if we can move it to ggml-org

ltoniazzi · 2024-08-08T08:58:45Z

@ngxson Quick update:
-> made gemma-2 small
-> trained it to learn to reproduce 1 paragraph from shakespeare
-> trained a lora on top to reproduce lyrics from a song. (Training on cpu/f32)

Inference:

Running in torch: both base and base+adapter reproduce the respective exact text they are trained on ✅.
Running base in llama.cpp: convert base to gguf/f32 -> reproduces something close to the exact text but a little messy ✅/❌
Running lora in llama.cpp: convert lora to gguf/f32: then llama-cli -m base.gguf --lora adapter.gguf returns gibberish! ❌

As many things could be going wrong, I wanted to check first that the layer's weights in safetensor are the same in gguf. I am using this code to print weights, but the printed weights are of order e-20, which makes me think something that is not float is being casted to a float.

Question: how do I print out the ggml tensor's weights?

ngxson · 2024-08-08T11:17:07Z

Nice, thanks for the info.

Question: how do I print out the ggml tensor's weights?

You can maybe have a look at gguf_dump.py and modify the python code to print tensor data.

Another option is to use transformers to load gguf: https://huggingface.co/docs/transformers/en/gguf

You can then use pytorch to compare tensors

github-actions · 2024-09-22T01:07:37Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

ngxson added the enhancement New feature or request label Jul 26, 2024

This was referenced Aug 9, 2024

tests : add integration test for lora adapters #8957

Merged

Add lora test workflow (WIP) #9058

Closed

github-actions bot added the stale label Sep 8, 2024

github-actions bot closed this as completed Sep 22, 2024

ltoniazzi mentioned this issue Dec 4, 2024

Fix HF repo commit to clone lora test models #10649

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lightweight tests for LoRA #8708

Add lightweight tests for LoRA #8708

ngxson commented Jul 26, 2024 •

edited

Loading

ngxson commented Jul 26, 2024

ltoniazzi commented Aug 5, 2024

ngxson commented Aug 5, 2024

ltoniazzi commented Aug 6, 2024 •

edited

Loading

ngxson commented Aug 6, 2024

ltoniazzi commented Aug 8, 2024 •

edited

Loading

ngxson commented Aug 8, 2024 •

edited

Loading

github-actions bot commented Sep 22, 2024

Add lightweight tests for LoRA #8708

Add lightweight tests for LoRA #8708

Comments

ngxson commented Jul 26, 2024 • edited Loading

ngxson commented Jul 26, 2024

ltoniazzi commented Aug 5, 2024

ngxson commented Aug 5, 2024

ltoniazzi commented Aug 6, 2024 • edited Loading

ngxson commented Aug 6, 2024

ltoniazzi commented Aug 8, 2024 • edited Loading

ngxson commented Aug 8, 2024 • edited Loading

github-actions bot commented Sep 22, 2024

ngxson commented Jul 26, 2024 •

edited

Loading

ltoniazzi commented Aug 6, 2024 •

edited

Loading

ltoniazzi commented Aug 8, 2024 •

edited

Loading

ngxson commented Aug 8, 2024 •

edited

Loading