Support GQA export, better run.c, Support tinyllama-1.1B #410

magician-blue · 2023-09-27T16:58:52Z

Add support to tinyllama-1.1B
Add support to convert GQA model (learned from ggerganov/llama.cpp#3364)
Better run.c

save a little memory(same as calculate key and value inside the kv cache #400)
make rope part a function
hardcode to check whether there is a \n

xefoci7612 · 2023-10-03T05:29:40Z

Current chat schemas in run.c are based on LLama 2

            // render user/system prompts into the Llama 2 Chat schema
            if (pos == 0 && system_prompt[0] != '\0') {
                char system_template[] = "[INST] <<SYS>>\n%s\n<</SYS>>\n\n%s [/INST]";
                sprintf(rendered_prompt, system_template, system_prompt, user_prompt);
            } else {
                char user_template[] = "[INST] %s [/INST]";
                sprintf(rendered_prompt, user_template, user_prompt);
            }

But you may want to use tinyllama's ones instead:

<|im_start|>user
Explain huggingface.<|im_end|>
<|im_start|>assistant

In general chat templates should be bounded to the loaded pre-trained model, so maybe they should be a configuration parameter in the .bin file

karpathy · 2023-10-09T15:08:18Z

export.py

@@ -368,11 +368,12 @@ def load_hf_model(model_path):
    config.dim = hf_model.config.hidden_size
    config.n_layers = hf_model.config.num_hidden_layers
    config.n_heads = hf_model.config.num_attention_heads
-    config.n_kv_heads = hf_model.config.num_attention_heads
+    config.n_kv_heads = hf_model.config.num_key_value_heads


For MHA model, the number of kv heads equals q heads.
However, for GQA model like llama2-70b, tinyllama1.1B, the number of kv heads and q head are different.

karpathy · 2023-10-09T15:09:04Z

run.c

@@ -451,7 +455,12 @@ void safe_printf(char *piece) {

 int str_lookup(char *str, TokenIndex *sorted_vocab, int vocab_size) {
    // efficiently find the perfect match for str in vocab, return its index or -1 if not found
-    TokenIndex tok = { .str = str }; // acts as the key to search for
+    char *input = "<0x0A>";


why is this delta here done?

I'm not sure whether I convert the tokenizer correctly. After I convert the tinyllama-1.1B's tokenizer. The run.c gets <0x0A> instead of \n. I'm trying to figure out how to convert the tokenizer better to remove these line.
Besides, I notice that our run.c can not deal with \n in the input (for tinystory 260k,15m,110m model). They will treat it as \\ and n.
In llama.cpp, they hardcode to convert \\n to \n.

karpathy · 2023-10-09T15:10:10Z

This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support.
Are there any notable architectural changes in it?
This PR is a bit of a random combination of necessary differences, and a few side optimizations.

magician-blue · 2023-10-11T05:38:35Z

This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support. Are there any notable architectural changes in it? This PR is a bit of a random combination of necessary differences, and a few side optimizations.

There isn't notable architectural changes.

kirp added 4 commits September 27, 2023 04:44

Support GQA convert, better run.c, update readme

6c1100e

format

06bc2bc

parallel

ce75e97

.

41a1810

xefoci7612 pushed a commit to xefoci7612/baby-llama2.cpp that referenced this pull request Oct 2, 2023

Backport Add support to convert GQA model

321905a

Backported from karpathy/llama2.c#410

karpathy reviewed Oct 9, 2023

View reviewed changes

karpathy added the high-priority label Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GQA export, better run.c, Support tinyllama-1.1B #410

Support GQA export, better run.c, Support tinyllama-1.1B #410

magician-blue commented Sep 27, 2023

xefoci7612 commented Oct 3, 2023

karpathy Oct 9, 2023

magician-blue Oct 11, 2023

karpathy Oct 9, 2023

magician-blue Oct 11, 2023 •

edited

Loading

karpathy commented Oct 9, 2023

magician-blue commented Oct 11, 2023

Support GQA export, better run.c, Support tinyllama-1.1B #410

Are you sure you want to change the base?

Support GQA export, better run.c, Support tinyllama-1.1B #410

Conversation

magician-blue commented Sep 27, 2023

xefoci7612 commented Oct 3, 2023

karpathy Oct 9, 2023

Choose a reason for hiding this comment

magician-blue Oct 11, 2023

Choose a reason for hiding this comment

karpathy Oct 9, 2023

Choose a reason for hiding this comment

magician-blue Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

karpathy commented Oct 9, 2023

magician-blue commented Oct 11, 2023

magician-blue Oct 11, 2023 •

edited

Loading