Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

Merged

Conversation

paschembri
Copy link
Contributor

Mistral AI v0.1 model works out of the box once converted with the script convert.py.

@slaren
Copy link
Collaborator

slaren commented Sep 27, 2023

Are there any details available about this model? All I could find about this release is a link to a torrent.

@paschembri
Copy link
Contributor Author

Inspecting the tokenizer model, there is evidence indicated a training dataset of 8T tokens (/mnt/test/datasets/tokenizer_training/8T_train_data/shuffled.txt)

The convert script worked and I am currently evaluating the model...

@slaren
Copy link
Collaborator

slaren commented Sep 27, 2023

F16 ppl looks good for a 7B model.
Final estimate: PPL = 5.6918 +/- 0.03191

Some generation [I believe the meaning of life is] simple, just like that “Life” episode of “The Twilight Zone”, with William Shatner. But sometimes it’s also easy to forget, and in those times a reminder from something or someone else can be very welcome.

Sometimes those reminders are small things that don’t seem important at all, but later on become more meaningful than one would have thought. Other times they are big, life-altering events. But as long as you get to see the beauty in them when they happen, they will help you live your best possible life.

Here is a list of reminders that I’ve had. Some may seem silly or irrelevant, but they’re all important and meaningful for me. These are my 50 things that make life worth living:

  1. To be able to feel the love of your family and your friends
  2. To love someone so much that their happiness means more than your own
  3. The simple joy of being happy
  4. Being able to live without fear in your mind, heart or soul
  5. A beautiful song that makes you want to cry because it is so gorgeous and inspiring
  6. A really good book that gets inside your head and refuses to let go
  7. Dancing on a rainy night
  8. To be able to say “I’m sorry” when you’ve done something wrong, even if no one else will know
  9. Watching the sunset with someone who understands why sunsets are so important
  10. A beautiful sunrise that reminds you of how much better your life could get right now
  11. The feeling of finally falling asleep after a bad day or week
  12. Falling in love for the first time, and realizing that this is what they really meant when they said “love at first sight”
  13. Hugging someone who needs it most
  14. Holding your child’s hand as they walk across the street for the first time without fear or apprehension
  15. Feeling like you’re part of something bigger than yourself
  16. Knowing that no matter what happens in your life, there will always be people who care about you and support you
  17. Watching someone else be happy when it seems impossible for them to find happiness anywhere else
  18. A song or poem that gives hope where there was none before
  19. Knowing that even though life isn’t fair sometimes, at least some part of it is working out okay for me right now (at least most of the time)
  20. Making someone laugh when they need it most—whether because you made them laugh or just by being there with them through their tears and pain
  21. Realizing that even though things might not go according to plan sometimes, life still has its moments of beauty and joy
  22. Feeling like anything is possible if only we believe hard enough in ourselves and what we can do together with others around us—no matter how small our dreams may seem at first glance [end of text]

slaren
slaren previously approved these changes Sep 27, 2023
@slaren
Copy link
Collaborator

slaren commented Sep 27, 2023

param.json:

{
    "dim": 4096,
    "n_layers": 32,
    "head_dim": 128,
    "hidden_dim": 14336,
    "n_heads": 32,
    "n_kv_heads": 8,
    "norm_eps": 1e-05,
    "sliding_window": 4096,
    "vocab_size": 32000
}

Looks like it uses sliding_window as the context length. convert.py may need to be updated. This may also be the first 7B model to use GQA.

@jxy
Copy link
Contributor

jxy commented Sep 27, 2023

Does sliding window attention actually work here, or it really only works with 4096 context length with llama.cpp? What happens if we set context length to 8192?

@paschembri
Copy link
Contributor Author

I did test before they released the model card on HF.

I'll try that

@TheBloke
Copy link
Contributor

TheBloke commented Sep 27, 2023

Currently convert.py is failing for me on the vocab - doesn't like that it's adding tokens 0, 1 and 2 in added_tokens.json. Haven't got as far as actually reading the model files

If anyone has converted this successfully, how did you make the fp16?

Oh never mind, I just deleted the added_tokens.json duh :)

@paschembri
Copy link
Contributor Author

Setting the context size to 8k actually works.

I got the model (a q6_K version) to perform a summary and the results are promising

@slaren
Copy link
Collaborator

slaren commented Sep 27, 2023

@TheBloke I just converted from the pth file in the torrent. There is no added_tokens.json there.

@TheBloke
Copy link
Contributor

TheBloke commented Sep 27, 2023

Ah OK fair enough, I've been using the official release from https://huggingface.co/mistralai/Mistral-7B-v0.1, which is in HF format and they added an added_tokens.json but I don't think they quite understand what it's for, because they've added the special tokens, which are already listed in tokenizer.json and tokenizer.model

Anyway my quants are up here and seem to work fine: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

@TheBloke
Copy link
Contributor

Actually no my quants don't work fine! I needed that permute fix. Re-making now

@paschembri
Copy link
Contributor Author

Ah OK fair enough, I've been using the official release from https://huggingface.co/mistralai/Mistral-7B-v0.1, which is in HF format and they added an added_tokens.json but I don't think they quite understand what it's for, because they've added the special tokens, which are already listed in tokenizer.json and tokenizer.model

Anyway my quants are up here and seem to work fine: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

You have to tell us how you can upload this fast to HF. For me it took forever !

@TheBloke
Copy link
Contributor

TheBloke commented Sep 27, 2023

OK all my quants are remade and re-uploaded and are working fine now.

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 The quick brown fox jumped over the lazy dog.

If you’re a writer who’s been looking for a place to publish, you may have seen this sentence somewhere in the fine print of an online magazine’s submission guidelines. It may also be found on writing sites as an example of how to use the various characters (letters and punctuation) available on your keyboard.

This classic example is sometimes called the “typewriter test.”  But nowadays, it’s a bit of a misnomer. The sentence looks like gibberish even if you copy-and-paste it into an email and send it to yourself.

The problem lies with the letter “J,” which is often mistakenly identified as a lowercase “L” by software programs, including Microsoft Word (which tends to have issues with all of the letters that look like each other).  The issue is not limited to just lowercase J and L; capital I and lower case l are also prone to being confused.

There’s an easy fix, though: simply replace the uppercase J in “dog” with a lowercase j (or vice versa) and you can test that your email program is picking up all 26 letters.

Here’s another example of what we’re talking about: [end of text]

You have to tell us how you can upload this fast to HF. For me it took forever !

10Gbit internet! :) I don't always have it sadly, but when only making GGUFs for a repo I use a Lambda Labs instance with beautiful 10GBit network - my record speed transferring to HF is 950MB/s 🤣

@slaren slaren dismissed their stale review September 27, 2023 16:49

Considering that sliding window attention is not implemented, this shouldn't be added yet.

@netrunnereve
Copy link
Collaborator

Are there any details available about this model? All I could find about this release is a link to a torrent.

They just produced a press release. It's a 7B model that apparently performs like LLaMA 2 13B and is under an Apache 2 license.

@paschembri
Copy link
Contributor Author

paschembri commented Sep 27, 2023

OK all my quants are remade and re-uploaded and are working fine now.

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 The quick brown fox jumped over the lazy dog.

If you’re a writer who’s been looking for a place to publish, you may have seen this sentence somewhere in the fine print of an online magazine’s submission guidelines. It may also be found on writing sites as an example of how to use the various characters (letters and punctuation) available on your keyboard.

This classic example is sometimes called the “typewriter test.”  But nowadays, it’s a bit of a misnomer. The sentence looks like gibberish even if you copy-and-paste it into an email and send it to yourself.

The problem lies with the letter “J,” which is often mistakenly identified as a lowercase “L” by software programs, including Microsoft Word (which tends to have issues with all of the letters that look like each other).  The issue is not limited to just lowercase J and L; capital I and lower case l are also prone to being confused.

There’s an easy fix, though: simply replace the uppercase J in “dog” with a lowercase j (or vice versa) and you can test that your email program is picking up all 26 letters.

Here’s another example of what we’re talking about: [end of text]

You have to tell us how you can upload this fast to HF. For me it took forever !

10Gbit internet! :) I don't always have it sadly, but when only making GGUFs for a repo I use a Lambda Labs instance with beautiful 10GBit network - my record speed transferring to HF is 950MB/s 🤣

They released the instruct model. I tried quantize but all I got is gibberish ... I'll try again (with the fix you mentioned)

EDIT: that was it (the fix)

@TheBloke
Copy link
Contributor

TheBloke commented Sep 27, 2023

Yeah Instruct is working well for me. Q5_K_M:

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 <s>[INST]Write a story about llamas [/INST] Once upon a time, high in the Andes Mountains of Peru, there lived a herd of llamas. They roamed freely on the vast green meadows, grazing on the lush grasses that grew there. The llamas were a happy and contented herd, enjoying their simple life in the mountains.

Despite their peaceful nature, however, the llamas were not without their challenges. For one thing, they had to contend with the many predators that lived in the Andes, including mountain lions, coyotes, and eagles. The llamas had to be always alert, ready to defend themselves and their young from harm.

In addition to predators, the llamas also had to deal with harsh weather conditions. The Andes Mountains can be cold and windy, especially at high altitudes. During the winter months, the llamas would huddle together for warmth, seeking shelter in the rocky crevices that offered protection from the elements.

Despite these challenges, the llama herd thrived. They were well adapted to life in the mountains, with strong legs and thick fleece that kept them warm in the cold. And they had each other for company, forming close bonds with their fellow llamas that helped them through the tough times.

As the years passed, the llama herd continued to grow and prosper. They were a proud and majestic sight to behold, roaming freely across the green meadows of the Andes Mountains. And so they lived, happy and contented, enjoying their simple life in the mountains. [end of text]

@Dampfinchen
Copy link

Does GQA work with it?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sliding window will be tracked here: #3377

@ggerganov ggerganov merged commit 4aea3b8 into ggerganov:master Sep 28, 2023
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 2, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  ggml-cuda : perform cublas mat mul of quantized types as f16 (ggerganov#3412)
  llama.cpp : add documentation about rope_freq_base and scale values (ggerganov#3401)
  train : fix KQ_pos allocation (ggerganov#3392)
  llama : quantize up to 31% faster on Linux and Windows with mmap (ggerganov#3206)
  readme : update hot topics + model links (ggerganov#3399)
  readme : add link to grammars app (ggerganov#3388)
  swift : fix build on xcode 15 (ggerganov#3387)
  build : enable more non-default compiler warnings (ggerganov#3200)
  ggml_tensor: update the structure comments. (ggerganov#3283)
  ggml : release the requested thread pool resource (ggerganov#3292)
  llama.cpp : split llama_context_params into model and context params (ggerganov#3301)
  ci : multithreaded builds (ggerganov#3311)
  train : finetune LORA (ggerganov#2632)
  gguf : basic type checking in gguf_get_* (ggerganov#3346)
  gguf : make token scores and types optional (ggerganov#3347)
  ci : disable freeBSD builds due to lack of VMs (ggerganov#3381)
  llama : custom attention mask + parallel decoding + no context swaps (ggerganov#3228)
  docs : mark code as Bash (ggerganov#3375)
  readme : add Mistral AI release 0.1 (ggerganov#3362)
  ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (ggerganov#3370)
yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants