b1584 #19

Nexesenex · 2023-12-01T07:22:42Z

No description provided.

get the correct n_orig_ctx in metal

* lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>

* copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…g.cmake (#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option

…l offload checks in llama.cpp (#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci

* fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n`

Typical sampling was broken because after copying new_candidates into canditates, the "sorted" bool is left at "true", but the new data is no longer sorted according to probability. Patch to set "sorted" to false. Test: Generating with temp=0.0001 (approx. argmax) should generate the same sequence at typical>=1.0 and typical=0.9999 (approx. disabled, but enters the typical sampling codepath).

* llama: fix alignment of general.name in print meta This commit fixes the alignment of the general.name field in the llm_load_print_meta function. Currently the output looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` And with this commit it looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama: fix alignment of special tokens Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

llama.cpp uses GitHub Actions, not Gitlab Actions.

docs: update how to run

* fix: readme * chore: resolve comments * chore: resolve comments

* main : Call llama_log_set to use LOG_TEE * tabs to spaces

* ShareGPT4 compatibility (vision encoder only loading) Load only a CLIP vision encoder (as supplied by ShareGPT finetunes) Corrects the argument parsing for --img_mean and --img_std (which were previously not parsed but attempted to access) Defines defaults for img_mean and img_std which are equal to the llava 1.5 CLIP encoder, so you do not have to provide them * Update convert-image-encoder-to-gguf.py

* cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit 635e9fa.

Co-authored-by: Will Findley <findley@gmail.com>

* * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * * change to set --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* * add --log-disable to disable logging to file in the server example * * typo fix

GGML_OP_RESHAPE, GGML_OP_VIEW, GGML_OP_PERMUTE, GGML_OP_TRANSPOSE, along with GGML_OP_NONE, are all noops. I.e., nothinh happens. But ggml still has a barrier after them, which wastes time. The waste is not too bad for large models where computations are long compared to the time taken for thread synchronization. But for small models skipping those unnecessary waits makes a significant difference. E.g., for the 99M TriLMamodel, TG-500 goes up to 1426 t/s from 1240 t/s. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

jxy and others added 27 commits November 26, 2023 10:30

metal : fix yarn (#4220)

22da055

get the correct n_orig_ctx in metal

readme : update hot topics

9656026

lookahead : support -n -1 infinite generation

3e73d31

ggml : fix -Warray-bounds warning with gcc (#4231)

f3b2698

readme : add Amica to UI list (#4230)

0dab8cd

cmake : fix issue with version info not getting baked into LlamaConfi…

b38a16d

…g.cmake (#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option

ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant ful…

8406b09

…l offload checks in llama.cpp (#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci

ggml : restore abort() in GGML_ASSERT (#4242)

64e64aa

readme : add FreeChat (#4248)

4fea342

examples : add readme files

1f5cd83

py : fix oai proxy (#3972)

e2bd725

* fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n`

convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258)

f4d973c

readme : fix typo (#4253)

74daaba

llama.cpp uses GitHub Actions, not Gitlab Actions.

cmake : fix the metal file foder path (#4217)

f7f9e06

batched.swift : update README.md (#4214)

bde629b

docs: update how to run

docker : add finetune option (#4211)

3bd2c7c

readme : fix (#4135)

524907a

* fix: readme * chore: resolve comments * chore: resolve comments

main : pass LOG_TEE callback to llama.cpp log (#4033)

8efa0f6

* main : Call llama_log_set to use LOG_TEE * tabs to spaces

make : fix Apple clang determination bug (#4272)

d2809a3

Co-authored-by: Will Findley <findley@gmail.com>

server : add --log-disable to disable logging to file (#4260)

1d14411

* * add --log-disable to disable logging to file in the server example * * typo fix

Nexesenex closed this Dec 1, 2023

Nexesenex reopened this Dec 1, 2023

Nexesenex merged commit a1bc245 into Nexesenex:master_experimental Dec 1, 2023
68 of 79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b1584 #19

b1584 #19

Nexesenex commented Dec 1, 2023

b1584 #19

b1584 #19

Conversation

Nexesenex commented Dec 1, 2023