Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper instructions for using Alpaca models #382

Closed
ggerganov opened this issue Mar 22, 2023 · 22 comments
Closed

Add proper instructions for using Alpaca models #382

ggerganov opened this issue Mar 22, 2023 · 22 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue 🦙. llama

Comments

@ggerganov
Copy link
Owner

ggerganov commented Mar 22, 2023

So I am looking at https://github.com/antimatter15/alpaca.cpp and I see they are already running 30B Alpaca models, while we are struggling to run 7B due to the recent tokenizer updates.

I also see that the models are now even floating on Hugging Face - I guess license issues are no longer a problem?

We should add detailed instructions for obtaining the Alpaca models and a temporary explanation how to use the following script to make the models compatible with the latest master:

#324 (comment)

The bigger issue is that people keep producing the old version of the ggml models instead of migrating to the latest llama.cpp changes. And therefore, we now need this extra conversion step. It's best to figure out the steps for generating the Alpaca models and generate them in the correct format.

Edit: just don't post direct links to the models!

@ggerganov ggerganov added documentation Improvements or additions to documentation help wanted Extra attention is needed good first issue Good for newcomers high priority Very important issue 🦙. llama labels Mar 22, 2023
@madmads11
Copy link

madmads11 commented Mar 22, 2023

Here is what I did to run Alpaca 30b on my system with llama.cpp. I would assume it would work with Alpaca 13b as well.

  1. Downloaded and built llama.cpp from scratch as the latest version is required to specify that the model is in 1 file with the new --n_parts 1 parameter
  2. Downloaded this 30b alpaca model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main (If you check the model card, you can find links to other alpaca model sizes)
  3. Named the file ggml-alpaca-30b-q4.bin and placed it in /models/Alpaca/30b inside llama.cpp
  4. Downloaded the script mentioned here: Breaking change of models since PR #252 #324 (comment)
  5. Named it convert.py and placed it in the root folder of llama.cpp.
  6. Downloaded the tokenizer mentioned here: Breaking change of models since PR #252 #324 (comment)
  7. Placed the tokenizer.model file in /models
  8. Ran python convert.py models/Alpaca/30b models/tokenizer.model in the command prompt from the base folder of llama.cpp (personally I got the message that I needed the module sentencepiece, so I wrote pip install sentencepiece and then I re-ran python convert.py models/Alpaca/30b models/tokenizer.model and it worked. You may or may not encounter this error.)
  9. In the 30b folder, there is now a ggml-alpaca-30b-q4.bin and a ggml-alpaca-30b-q4.bin.tmp file, I renamed ggml-alpaca-30b-q4.bin to ggml-alpaca-30b-q4.bin.old to keep it as a backup, and ggml-alpaca-30b-q4.bin.tmp to ggml-alpaca-30b-q4.bin
  10. Now I can run llama.cpp with ./main -m ./models/alpaca/30b/ggml-alpaca-30b-q4.bin --color -f ./prompts/alpaca.txt -ins --n_parts 1.

Maybe this can be of temporary help to anybody else eager to set it up. Please correct me if I've made any mistakes, I wrote it retroactively from memory.

@Puncia
Copy link

Puncia commented Mar 22, 2023

Can confirm the above works for the 13B model too.

@lolxdmainkaisemaanlu
Copy link

The above instructions work for me too for the 13B model! Thank you!

@Green-Sky
Copy link
Collaborator

Green-Sky commented Mar 22, 2023

check sum for the converted (ggmf v1) Pi3141 alpaca-30B-ggml

$ sha256sum ggml-model-q4_0.bin
969652d32ce186ca3c93217ece8311ebe81f15939aa66a6fe162a08dd893faf8  ggml-model-q4_0.bin

@anzz1
Copy link
Contributor

anzz1 commented Mar 23, 2023

all of them (7B/13B/30B/65B*)
4b quantized q4_0 (RTN) and GPTQ
new tokenizer format
*no alpaca-65b tho as it would take very long time
does not include batteries

https://btcache.me/torrent/E5322AB4676E24632A907FD9846234BB40265C4F
https://torrage.info/torrent.php?h=e5322ab4676e24632a907fd9846234bb40265c4f

single command option:

aria2c --summary-interval=0 --bt-max-peers=0 http://taco.cab/ggml/ggml-q4.torrent

as usual, the alpaca and gptq models need the --n_parts 1 option

palpaca-7B

hope that helps 👍

@Green-Sky
Copy link
Collaborator

@anzz1 you did not specify for which model your links are. also please provide checksums :)

@Green-Sky
Copy link
Collaborator

me: i should try and debug all those crashes
me: > help me write a song about llama.cpp (c++ api for facebooks llm)
llama.cpp:

A llama is an animal that's so strange,
It can do things we only imagine.
LLamaCPP is the code that gives it its brawn,
Allowing us to use it like a clown.

The api has commands we can use,
To take advantage of this llama abuse.
It's an interface that let's us be boss,
If you know the right way to make your call.

(the 30B alpaca lora finetune by pi)

@anzz1
Copy link
Contributor

anzz1 commented Mar 23, 2023

i linked the checksums here #374 (comment)

@anzz1
Copy link
Contributor

anzz1 commented Mar 24, 2023

@anzz1 Thank you for the download. Did you see the latest fix to GPTQ conversion?

Yes.

@anzz1 Any chance you could re convert them using the changes from this: #423

No need.

@madmads11
Copy link

I just saw the updated readme file stating that you cannot link to model downloads anywhere on this repository. Would instructions like mine, where in step 2 I link to a model download on HF, violate that rule going forward? I assume that the instructions as is are okay because they were written before the rule, but what about going forward?

@Green-Sky
Copy link
Collaborator

the ones you linked are sadly mixed, and not "pure" lora models. so i would assume no.
you could just say "pi3141 alpaca 30B" model, and it would be fine i guess.

@madmads11
Copy link

the ones you linked are sadly mixed, and not "pure" lora models. so i would assume no. you could just say "pi3141 alpaca 30B" model, and it would be fine i guess.

Interesting, I didn't realize it was mixed. Can you explain what that means in this context?

@Green-Sky
Copy link
Collaborator

Green-Sky commented Mar 24, 2023

"mixed" -> "merged"
If you look at this for example https://huggingface.co/tloen/alpaca-lora-7b/tree/main , those are only the lora weights.
I think (need to actually read the paper) those are either not directly derived from llama, or are derived enough, to count as remixing/fairuse or something.

edit: you can clearly see by the filesize.

@ghost
Copy link

ghost commented Mar 25, 2023

Worked for me thanks @anzz1 the AI is running kind of slow tho, Im on windows with 5950X and 80+ GB ram... But the writting time is like GPT4 on max load x) any params I forgot to set ?
Edit : tried to change the -t value to 32, nothing changes, prompt is still slow AF

@paniphons
Copy link

paniphons commented Mar 28, 2023

@anzz1 , did you actually try those models? Using the latest master (4b8efff) on Windows to try to load alpaca-13B-ggml GPTQ from your torrent, it just starts spitting out C# code as soon as I launch it.

C:\_downloads\ggml-q4\models\alpaca-13B-ggml>main.exe -m ggml-model-gptq4.bin --interactive --color --n_parts 1
main: seed = 1679990008
llama_model_load: loading model from 'ggml-model-gptq4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 4
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 1
llama_model_load: type    = 2
llama_model_load: ggml ctx size = 10101.68 MB
llama_model_load: mem required  = 12149.68 MB (+ 1608.00 MB per state)
llama_model_load: loading model part 1/1 from 'ggml-model-gptq4.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  9701.58 MB / num tensors = 363
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace _14.Bucket_Sort
{
    class Program
    {
        static void Main(string[] args)
        {
            var input = Console.ReadLine();
            int n = int.Parse(input);

Then I used CTRL+C to interrupt it thinking it could be a minor bug, and asked it "who is Kanye West". Response until I closed the program:

                  What did he do?
                ;
                arr[i] = long.Parse(line);
            }
            Array.Sort(arr);
            for (int i = 0; i < n;

I also downloaded the non-GPTQ version, it has the same issue, spitting out C++ code:

>main.exe -m ggml-model-q4_0.bin --interactive --color --n_parts 1
main: seed = 1679992628
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 1
llama_model_load: type    = 2
llama_model_load: ggml ctx size = 8159.49 MB
llama_model_load: mem required  = 10207.49 MB (+ 1608.00 MB per state)
llama_model_load: loading model part 1/1 from 'ggml-model-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  7759.39 MB / num tensors = 363
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.


#include "pch.h"
#include "Scenario1_LaunchUri.xaml.h"

using namespace SDKTemplate

@maria-mh07
Copy link

Hi! I'm on windows, using master 5a5f8b1

  • I downloaded 13b and 30b alpaca models as mentioned by @madmads11 and @Puncia
  • Ran python convert-unversioned-ggml-to-ggml.py models\Alpaca\13B models/LLaMA/tokenizer.model and python convert-unversioned-ggml-to-ggml.py models\Alpaca\30B models/LLaMA/tokenizer.model
  • I can run llama.cpp with bin\Release\main.exe -m models\Alpaca\13B\ggml-alpaca-13b-q4_0.bin --n_parts 1 --color -f prompts\alpaca.txt -ins -t 6 or bin\Release\main.exe -m models\Alpaca\30B\ggml-alpaca-30b-q4_0.bin --n_parts 1 --color -f prompts\alpaca.txt -ins -t 6 but it doesn't work well

binReleasemain exe -m modelsAlpaca13Bggml-alpaca-13b-q4_0 bin --n_parts 1 --color -f promptsalpaca txt -ins t -7

Does this happen to everyone or just me?

@morpheus2448
Copy link

morpheus2448 commented Mar 29, 2023

I edited this whole thing because it was basically incorrect.

@maria-mh07 It's working more or less as you should expect.

@paniphons You need to provide a prompt from the command line with --prompt or using -f and point to a file.

@LitenBuzzTh
Copy link

what do the other parameters do? its a bit confusing
repeat_last_n
repeat_penalty
top_k
top_p
temp
seed
threads

@j-f1
Copy link
Collaborator

j-f1 commented Mar 31, 2023

I explained a bunch of them in #559 (comment).

@robin-coac
Copy link

Hi @madmads11 @j-f1
Just yesterday, this migration script was added : migrate-ggml-2023-03-30-pr613.py.
So, what I did on top of @madmads11 instructions was to use this above script and generate the final bin file to work with.

Details :

I am using llama.cpp just today to run alpaca model. (was using antimatters alpaca.cpp until now)

This same model that's converted and loaded in llama.cpp runs very slow compared to running it in alpaca.cpp.

How I started up model :

  • ./main -m ./models/alpaca-7b-migrated.bin -ins --n_parts 1

The logs :

main: seed = 1680346670
llama_model_load: loading model from './models/alpaca-7b-migrated.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/alpaca-7b-migrated.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 2

Additionally, I also used this bin file : https://huggingface.co/Pi3141/alpaca-lora-7B-ggml/blob/main/ggml-model-q4_1.bin that's already migrated for llama.cpp. And even for this, model is running slow with llama.cpp.

One thing I noticed was, while loading between these two model variants, this line is different than on above.
llama_model_load: f16 = 3.

@sachinspanicker
Copy link

Here is what I did to run Alpaca 30b on my system with llama.cpp. I would assume it would work with Alpaca 13b as well.

  1. Downloaded and built llama.cpp from scratch as the latest version is required to specify that the model is in 1 file with the new --n_parts 1 parameter
  2. Downloaded this 30b alpaca model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main (If you check the model card, you can find links to other alpaca model sizes)
  3. Named the file ggml-alpaca-30b-q4.bin and placed it in /models/Alpaca/30b inside llama.cpp
  4. Downloaded the script mentioned here: Breaking change of models since PR #252 #324 (comment)
  5. Named it convert.py and placed it in the root folder of llama.cpp.
  6. Downloaded the tokenizer mentioned here: Breaking change of models since PR #252 #324 (comment)
  7. Placed the tokenizer.model file in /models
  8. Ran python convert.py models/Alpaca/30b models/tokenizer.model in the command prompt from the base folder of llama.cpp (personally I got the message that I needed the module sentencepiece, so I wrote pip install sentencepiece and then I re-ran python convert.py models/Alpaca/30b models/tokenizer.model and it worked. You may or may not encounter this error.)
  9. In the 30b folder, there is now a ggml-alpaca-30b-q4.bin and a ggml-alpaca-30b-q4.bin.tmp file, I renamed ggml-alpaca-30b-q4.bin to ggml-alpaca-30b-q4.bin.old to keep it as a backup, and ggml-alpaca-30b-q4.bin.tmp to ggml-alpaca-30b-q4.bin
  10. Now I can run llama.cpp with ./main -m ./models/alpaca/30b/ggml-alpaca-30b-q4.bin --color -f ./prompts/alpaca.txt -ins --n_parts 1.

Maybe this can be of temporary help to anybody else eager to set it up. Please correct me if I've made any mistakes, I wrote it retroactively from memory.

I get this error upon running Convert

% python3 convert.py models/alpaca/13B models/tokenizer.model
converting models/alpaca/13B/ggml-model-q4_0.bin
Traceback (most recent call last):
File "/Users/FD00199/llama.cpp/convert.py", line 96, in
main()
File "/Users/FD00199/llama.cpp/convert.py", line 93, in main
convert_one_file(file, tokenizer)
File "/Users/FD00199/llama.cpp/convert.py", line 78, in convert_one_file
write_header(f_out, read_header(f_in))
File "/Users/FD00199/llama.cpp/convert.py", line 27, in write_header
raise Exception('Invalid file magic. Must be an old style ggml file.')
Exception: Invalid file magic. Must be an old style ggml file.

tslmy added a commit to tslmy/llama.cpp that referenced this issue Jul 2, 2023
The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggerganov#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model.
ggerganov pushed a commit that referenced this issue Jul 6, 2023
The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in #382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model.
YellowRoseCx added a commit to YellowRoseCx/koboldcpp-rocm that referenced this issue Jul 10, 2023
commit 8432e9d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 9 16:55:30 2023 -0500

    Update Makefile

commit b58c189
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 9 16:20:00 2023 -0500

    Add multi-gpu CuBLAS support to new GUI

commit 0c1c71b
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jul 8 07:56:57 2023 -0500

    Update Makefile

commit f864f60
Author: Johannes Gäßler <johannesg@5d6.de>
Date:   Sat Jul 8 00:25:15 2023 +0200

    CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140)

commit 4539bc2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jul 8 01:36:14 2023 -0500

    update makefile for changes

commit 912e31e
Merge: 74e2703 ddaa4f2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Fri Jul 7 23:15:37 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit ddaa4f2
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Jul 7 22:14:14 2023 +0800

    fix cuda garbage results and gpu selection issues

commit 95eca51
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Jul 7 18:39:47 2023 +0800

    add gpu choice for GUI for cuda

commit a689a66
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Jul 7 17:52:34 2023 +0800

    make it work with pyinstaller

commit 9ee9a77
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Jul 7 16:25:37 2023 +0800

    warn outdated GUI (+1 squashed commits)

    Squashed commits:

    [15aec3d] spelling error

commit 32102c2
Merge: 8424a35 481f793
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Fri Jul 7 14:15:39 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	README.md

commit 481f793
Author: Howard Su <howard0su@gmail.com>
Date:   Fri Jul 7 11:34:18 2023 +0800

    Fix opencl by wrap #if-else-endif with \n (ggerganov#2086)

commit dfd9fce
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Jul 6 19:41:31 2023 +0300

    ggml : fix restrict usage

commit 36680f6
Author: Judd <foldl@users.noreply.github.com>
Date:   Fri Jul 7 00:23:49 2023 +0800

    convert : update for baichuan (ggerganov#2081)

    1. guess n_layers;
    2. relax warnings on context size;
    3. add a note that its derivations are also supported.

    Co-authored-by: Judd <foldl@boxvest.com>

commit a17a268
Author: tslmy <tslmy@users.noreply.github.com>
Date:   Thu Jul 6 09:17:50 2023 -0700

    alpaca.sh : update model file name (ggerganov#2074)

    The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggerganov#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model.

commit 8424a35
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Thu Jul 6 23:24:21 2023 +0800

    added the ability to ban any substring tokens

commit 27a0907
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Thu Jul 6 22:33:46 2023 +0800

    backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas

commit 220aa70
Merge: 4d1700b 31cfbb1
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Thu Jul 6 15:40:40 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	CMakeLists.txt
    #	Makefile
    #	README.md
    #	pocs/vdot/q8dot.cpp
    #	pocs/vdot/vdot.cpp
    #	scripts/sync-ggml.sh
    #	tests/test-grad0.c
    #	tests/test-quantize-fns.cpp
    #	tests/test-quantize-perf.cpp

commit 4d1700b
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Thu Jul 6 15:17:47 2023 +0800

    adjust some ui sizing

commit 1c80002
Author: Vali-98 <137794480+Vali-98@users.noreply.github.com>
Date:   Thu Jul 6 15:00:57 2023 +0800

    New UI using customtkinter (LostRuins#284)

    * Initial conversion to customtkinter.

    * Initial conversion to customtkinter.

    * Additions to UI, still non-functional

    * UI now functional, untested

    * UI now functional, untested

    * Added saving configs

    * Saving and loading now functional

    * Fixed sliders not loading

    * Cleaned up duplicate arrays

    * Cleaned up duplicate arrays

    * Fixed loading bugs

    * wip fixing all the broken parameters. PLEASE test before you commit

    * further cleaning

    * bugfix completed for gui. now evaluating save and load

    * cleanup prepare to merge

    ---------

    Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>

commit 31cfbb1
Author: Tobias Lütke <tobi@shopify.com>
Date:   Wed Jul 5 16:51:13 2023 -0400

    Expose generation timings from server & update completions.js (ggerganov#2116)

    * use javascript generators as much cleaner API

    Also add ways to access completion as promise and EventSource

    * export llama_timings as struct and expose them in server

    * update readme, update baked includes

    * llama : uniform variable names + struct init

    ---------

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit 74e2703
Merge: cf65429 f9108ba
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jul 5 15:16:49 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 983b555
Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com>
Date:   Wed Jul 5 18:03:19 2023 +0000

    Update Server Instructions (ggerganov#2113)

    * Update server instructions for web front end
    * Update server README
    * Remove duplicate OAI instructions
    * Fix duplicate text

    ---------

    Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com>

commit ec326d3
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Wed Jul 5 20:44:11 2023 +0300

    ggml : fix bug introduced in LostRuins#1237

commit 1b6efea
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Wed Jul 5 20:20:05 2023 +0300

    tests : fix test-grad0

commit 1b107b8
Author: Stephan Walter <stephan@walter.name>
Date:   Wed Jul 5 16:13:06 2023 +0000

    ggml : generalize `quantize_fns` for simpler FP16 handling (LostRuins#1237)

    * Generalize quantize_fns for simpler FP16 handling

    * Remove call to ggml_cuda_mul_mat_get_wsize

    * ci : disable FMA for mac os actions

    ---------

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit 8567c76
Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com>
Date:   Wed Jul 5 15:13:35 2023 +0000

    Update server instructions for web front end (ggerganov#2103)

    Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com>

commit 924dd22
Author: Johannes Gäßler <johannesg@5d6.de>
Date:   Wed Jul 5 14:19:42 2023 +0200

    Quantized dot products for CUDA mul mat vec (ggerganov#2067)

commit 051c70d
Author: Howard Su <howard0su@gmail.com>
Date:   Wed Jul 5 18:31:23 2023 +0800

    llama: Don't double count the sampling time (ggerganov#2107)

commit ea79e54
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Wed Jul 5 17:29:35 2023 +0800

    fixed refusing to quantize some models

commit 9e4475f
Author: Johannes Gäßler <johannesg@5d6.de>
Date:   Wed Jul 5 08:58:05 2023 +0200

    Fixed OpenCL offloading prints (ggerganov#2082)

commit 7f0e9a7
Author: Nigel Bosch <pnigelb@gmail.com>
Date:   Tue Jul 4 18:33:33 2023 -0500

    embd-input: Fix input embedding example unsigned int seed (ggerganov#2105)

commit b472f3f
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Jul 4 22:25:22 2023 +0300

    readme : add link web chat PR

commit ed9a54e
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Tue Jul 4 21:54:11 2023 +0300

    ggml : sync latest (new ops, macros, refactoring) (ggerganov#2106)

    - add ggml_argmax()
    - add ggml_tanh()
    - add ggml_elu()
    - refactor ggml_conv_1d() and variants
    - refactor ggml_conv_2d() and variants
    - add helper macros to reduce code duplication in ggml.c

commit f257fd2
Author: jwj7140 <32943891+jwj7140@users.noreply.github.com>
Date:   Wed Jul 5 03:06:12 2023 +0900

    Add an API example using server.cpp similar to OAI. (ggerganov#2009)

    * add api_like_OAI.py
    * add evaluated token count to server
    * add /v1/ endpoints binding

commit 7ee76e4
Author: Tobias Lütke <tobi@shopify.com>
Date:   Tue Jul 4 10:05:27 2023 -0400

    Simple webchat for server (ggerganov#1998)

    * expose simple web interface on root domain

    * embed index and add --path for choosing static dir

    * allow server to multithread

    because web browsers send a lot of garbage requests we want the server
    to multithread when serving 404s for favicon's etc. To avoid blowing up
    llama we just take a mutex when it's invoked.

    * let's try this with the xxd tool instead and see if msvc is happier with that

    * enable server in Makefiles

    * add /completion.js file to make it easy to use the server from js

    * slightly nicer css

    * rework state management into session, expose historyTemplate to settings

    ---------

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

commit acc111c
Author: Henri Vasserman <henv@hot.ee>
Date:   Tue Jul 4 15:38:04 2023 +0300

    Allow old Make to build server. (ggerganov#2098)

    Also make server build by default.

    Tested with Make 3.82

commit 23c7c6f
Author: ZhouYuChen <zhouyuchen@naver.com>
Date:   Tue Jul 4 20:15:16 2023 +0800

    Update Makefile: clean simple (ggerganov#2097)

commit 69add28
Merge: 00e35d0 698efad
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 18:51:42 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml

commit 00e35d0
Merge: fff705d f9108ba
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 18:46:40 2023 +0800

    Merge branch 'concedo' into concedo_experimental

commit f9108ba
Author: Michael Moon <triffid.hunter@gmail.com>
Date:   Tue Jul 4 18:46:08 2023 +0800

    Make koboldcpp.py executable on Linux (LostRuins#293)

commit fff705d
Merge: 784628a c6c0afd
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 18:42:02 2023 +0800

    Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental

commit c6c0afd
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 18:35:03 2023 +0800

    refactor to avoid code duplication

commit 784628a
Merge: ca9a116 309534d
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 16:38:32 2023 +0800

    Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental

commit 698efad
Author: Erik Scholz <Green-Sky@users.noreply.github.com>
Date:   Tue Jul 4 01:50:12 2023 +0200

    CI: make the brew update temporarily optional. (ggerganov#2092)

    until they decide to fix the brew installation in the macos runners.
    see the open issues. eg actions/runner-images#7710

commit 14a2cc7
Author: Govlzkoy <gotope@users.noreply.github.com>
Date:   Tue Jul 4 07:50:00 2023 +0800

    [ggml] fix index for ne03 value in ggml_cl_mul_f32 (ggerganov#2088)

commit cf65429
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jul 3 16:56:40 2023 -0500

    print cuda or opencl based on what's used

commit 72c16d2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jul 3 16:45:39 2023 -0500

    Revert "fix my mistake that broke other arches"

    This reverts commit 777aed5.

commit 1cf14cc
Author: Henri Vasserman <henv@hot.ee>
Date:   Tue Jul 4 00:05:23 2023 +0300

    fix server crashes (ggerganov#2076)

commit 777aed5
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jul 3 15:53:32 2023 -0500

    fix my mistake that broke other arches

commit cc45a7f
Author: Howard Su <howard0su@gmail.com>
Date:   Tue Jul 4 02:43:55 2023 +0800

    Fix crash of test-tokenizer-0 under Debug build (ggerganov#2064)

    * Fix crash of test-tokenizer-0 under Debug build

    * Change per comment

commit ca9a116
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Tue Jul 4 00:35:02 2023 +0800

    possibly slower, but cannot use larger batches without modifying ggml library.

commit bfeb347
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Mon Jul 3 21:36:42 2023 +0800

    fix typos

commit 55dbb91
Author: Howard Su <howard0su@gmail.com>
Date:   Mon Jul 3 19:58:58 2023 +0800

    [llama] No need to check file version when loading vocab score (ggerganov#2079)

commit d7d2e6a
Author: WangHaoranRobin <56047610+WangHaoranRobin@users.noreply.github.com>
Date:   Mon Jul 3 05:38:44 2023 +0800

    server: add option to output probabilities for completion (ggerganov#1962)

    * server: add option to output probabilities for completion
    * server: fix issue when handling probability output for incomplete tokens for multibyte character generation
    * server: fix llama_sample_top_k order
    * examples/common.h: put all bool variables in gpt_params together

commit 27780a9
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 16:03:27 2023 -0500

    rocm fixes

commit f52c7d4
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 16:02:58 2023 -0500

    Revert "rocm fixes"

    This reverts commit 2fe9927.

commit 2fe9927
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 15:58:21 2023 -0500

    rocm fixes

commit efe7560
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 15:55:43 2023 -0500

    Revert "move HIPBLAS definitions into ggml-cuda.h"

    This reverts commit bf49a93.

commit 4fc0181
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 15:55:36 2023 -0500

    Revert "move hipblas definitions to header files"

    This reverts commit 2741ffb.

commit 89eb576
Merge: 2741ffb 3d2907d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jul 2 14:44:13 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 309534d
Author: Ycros <18012+ycros@users.noreply.github.com>
Date:   Sun Jul 2 18:15:34 2023 +0000

    implement sampler order, expose sampler order and mirostat in api

commit 3d2907d
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Jul 2 18:28:09 2023 +0800

    make gptneox and gptj work with extended context too

commit d6b47e6
Merge: e17c849 46088f7
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Jul 2 17:26:39 2023 +0800

    Merge branch 'master' into concedo_experimental

commit e17c849
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Jul 2 17:25:08 2023 +0800

    switched to NTK aware scaling

commit e19483c
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Jul 2 14:55:08 2023 +0800

    increase scratch for above 4096

commit 46088f7
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Jul 2 09:46:46 2023 +0300

    ggml : fix build with OpenBLAS (close ggerganov#2066)

commit b85ea58
Merge: ef3b8dc 0bc2cdf
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date:   Sun Jul 2 14:45:25 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	README.md

commit 2741ffb
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jul 1 17:07:42 2023 -0500

    move hipblas definitions to header files

commit bf49a93
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jul 1 16:38:50 2023 -0500

    move HIPBLAS definitions into ggml-cuda.h

commit 540f4e0
Merge: 2c3b46f eda663f
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jul 1 14:58:32 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 0bc2cdf
Author: Johannes Gäßler <johannesg@5d6.de>
Date:   Sat Jul 1 21:49:44 2023 +0200

    Better CUDA synchronization logic (ggerganov#2057)

commit befb3a3
Author: Johannes Gäßler <johannesg@5d6.de>
Date:   Sat Jul 1 21:47:26 2023 +0200

    Test-based VRAM scratch size + context adjustment (ggerganov#2056)

commit b213227
Author: Daniel Drake <drake@endlessos.org>
Date:   Sat Jul 1 20:31:44 2023 +0200

    cmake : don't force -mcpu=native on aarch64 (ggerganov#2063)

    It's currently not possible to cross-compile llama.cpp for aarch64
    because CMakeLists.txt forces -mcpu=native for that target.

    -mcpu=native doesn't make sense if your build host is not the
    target architecture, and clang rejects it for that reason, aborting the
    build. This can be easily reproduced using the current Android NDK to build
    for aarch64 on an x86_64 host.

    If there is not a specific CPU-tuning target for aarch64 then -mcpu
    should be omitted completely. I think that makes sense, there is not
    enough variance in the aarch64 instruction set to warrant a fixed -mcpu
    optimization at this point. And if someone is building natively and wishes
    to enable any possible optimizations for the host device, then there is
    already the LLAMA_NATIVE option available.

    Fixes LostRuins#495.

commit 2f8cd97
Author: Aaron Miller <apage43@ninjawhale.com>
Date:   Sat Jul 1 11:14:59 2023 -0700

    metal : release buffers when freeing metal context (ggerganov#2062)

commit 471aab6
Author: Judd <foldl@users.noreply.github.com>
Date:   Sun Jul 2 01:00:25 2023 +0800

    convert : add support of baichuan-7b (ggerganov#2055)

    Co-authored-by: Judd <foldl@boxvest.com>

commit 463f2f4
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Jul 1 19:05:09 2023 +0300

    llama : fix return value of llama_load_session_file_internal (ggerganov#2022)

commit cb44dbc
Author: Rand Xie <randxiexyy29@gmail.com>
Date:   Sun Jul 2 00:02:58 2023 +0800

    llama : catch llama_load_session_file_internal exceptions (ggerganov#2022)

    * convert checks in llama_load_session_file to throw and handle them

    * make llama_load_session_file_internal static

    * address feedbacks to avoid using exceptions

commit 79f634a
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Jul 1 18:46:00 2023 +0300

    embd-input : fix returning ptr to temporary

commit 04606a1
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sat Jul 1 18:45:44 2023 +0300

    train : fix compile warning

commit b1ca8f3
Author: Qingyou Meng <meng.qingyou@gmail.com>
Date:   Sat Jul 1 23:42:43 2023 +0800

    ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (ggerganov#1995)

    Will not be scheduled unless explicitly enabled.

commit 2c3b46f
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 29 18:43:43 2023 -0500

    changes to fix build

commit c9e1103
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 29 18:20:07 2023 -0500

    Update ggml_v2-cuda-legacy.cu for ROCM

commit b858fc5
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 29 17:49:39 2023 -0500

    changes to work with upstream

commit 69a0c25
Merge: 096f0b0 1347d3a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 29 16:59:06 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 096f0b0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jun 28 15:27:02 2023 -0500

    revert unnecessary hipblas conditionals

commit d81e81a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jun 28 14:48:23 2023 -0500

    Update Makefile hipblas nvcc correction

commit 2579ecf
Merge: abed427 d2034ce
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 25 17:50:04 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit abed427
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jun 24 19:16:30 2023 -0500

    reorganize If statements to include proper headers

commit 06c3bf0
Merge: ea6d320 8342fe8
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sat Jun 24 16:57:20 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit ea6d320
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Fri Jun 23 01:53:28 2023 -0500

    Update README.md

commit 4d56ad8
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 22 16:19:43 2023 -0500

    Update README.md

commit 21f9308
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 22 15:42:05 2023 -0500

    kquants_iter for hipblas and add gfx803

commit b6ff890
Merge: eb094f0 e6ddb15
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Thu Jun 22 12:42:09 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit eb094f0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jun 21 23:59:18 2023 -0500

    lowvram parameter description

commit 3a5dfeb
Merge: 665cc11 b1f00fa
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jun 21 16:53:03 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit 665cc11
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Jun 21 01:13:19 2023 -0500

    add lowvram parameter

commit 222cbbb
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Jun 20 19:03:28 2023 -0500

    add additional hipblas conditions for cublas

commit e1f9581
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Jun 20 16:51:59 2023 -0500

    Add hip def for cuda v2

commit 3bff5c0
Merge: a7e74b3 266d47a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Tue Jun 20 13:38:06 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit a7e74b3
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jun 19 22:04:18 2023 -0500

    Update README.md

commit 5e99b3c
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jun 19 22:03:42 2023 -0500

    Update Makefile

commit 9190b17
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Mon Jun 19 21:47:10 2023 -0500

    Update README.md

commit 2780ea2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 18 15:48:00 2023 -0500

    Update Makefile

commit 04a3e64
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 18 14:33:39 2023 -0500

    remove extra line

commit cccbca9
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 18 14:31:17 2023 -0500

    attempt adding ROCM hipblas

commit a44a1d4
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 18 14:31:01 2023 -0500

    attempt adding ROCM hipblas

commit b088184
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Sun Jun 18 14:30:54 2023 -0500

    attempt adding ROCM hipblas
@Wataru3355
Copy link

Wataru3355 commented Oct 11, 2023

Here is what I did to run Alpaca 30b on my system with llama.cpp. I would assume it would work with Alpaca 13b as well.

  1. Downloaded and built llama.cpp from scratch as the latest version is required to specify that the model is in 1 file with the new --n_parts 1 parameter
  2. Downloaded this 30b alpaca model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main (If you check the model card, you can find links to other alpaca model sizes)
  3. Named the file ggml-alpaca-30b-q4.bin and placed it in /models/Alpaca/30b inside llama.cpp
  4. Downloaded the script mentioned here: Breaking change of models since PR #252 #324 (comment)
  5. Named it convert.py and placed it in the root folder of llama.cpp.
  6. Downloaded the tokenizer mentioned here: Breaking change of models since PR #252 #324 (comment)
  7. Placed the tokenizer.model file in /models
  8. Ran python convert.py models/Alpaca/30b models/tokenizer.model in the command prompt from the base folder of llama.cpp (personally I got the message that I needed the module sentencepiece, so I wrote pip install sentencepiece and then I re-ran python convert.py models/Alpaca/30b models/tokenizer.model and it worked. You may or may not encounter this error.)
  9. In the 30b folder, there is now a ggml-alpaca-30b-q4.bin and a ggml-alpaca-30b-q4.bin.tmp file, I renamed ggml-alpaca-30b-q4.bin to ggml-alpaca-30b-q4.bin.old to keep it as a backup, and ggml-alpaca-30b-q4.bin.tmp to ggml-alpaca-30b-q4.bin
  10. Now I can run llama.cpp with ./main -m ./models/alpaca/30b/ggml-alpaca-30b-q4.bin --color -f ./prompts/alpaca.txt -ins --n_parts 1.

Maybe this can be of temporary help to anybody else eager to set it up. Please correct me if I've made any mistakes, I wrote it retroactively from memory.

I get this error upon running Convert

% python3 convert.py models/alpaca/13B models/tokenizer.model converting models/alpaca/13B/ggml-model-q4_0.bin Traceback (most recent call last): File "/Users/FD00199/llama.cpp/convert.py", line 96, in main() File "/Users/FD00199/llama.cpp/convert.py", line 93, in main convert_one_file(file, tokenizer) File "/Users/FD00199/llama.cpp/convert.py", line 78, in convert_one_file write_header(f_out, read_header(f_in)) File "/Users/FD00199/llama.cpp/convert.py", line 27, in write_header raise Exception('Invalid file magic. Must be an old style ggml file.') Exception: Invalid file magic. Must be an old style ggml file.

if you have this version : ggml-model-q4_1.bin you have the error
with ggml-model-q4_0.bin you don't have the error

YuMJie pushed a commit to YuMJie/powerinfer that referenced this issue Oct 25, 2024
The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggerganov/llama.cpp#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue 🦙. llama
Projects
None yet
Development

No branches or pull requests