Concedo experimental #18

Nexesenex · 2023-12-01T07:20:24Z

No description provided.

* Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (ggerganov#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>

…4133) * Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement

* Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md

llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.

* ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>

…gerganov#4189)

* reserve space for codepoints * improvement for the appended 0

* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference

get the correct n_orig_ctx in metal

* lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>

# Conflicts: # Makefile # README.md

* copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…g.cmake (ggerganov#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option

…l offload checks in llama.cpp (ggerganov#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci

…gmentation causing issues in some scenarios.

# Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # scripts/build-info.cmake

… value instead of added on load.

# Conflicts: # README.md

Squashed commits: [cdb7426] fixed chub ai imports

jammm and others added 30 commits November 20, 2023 17:02

readme : update ROCm Windows instructions (ggerganov#4122)

dfc7cd4

* Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

finetune - update readme to mention llama support only (ggerganov#4148)

0b871f1

stablelm : simplify + speedup generation (ggerganov#4153)

8e672ef

docs : add llama-star arch idea

ff8238f

examples : fix typo in parallel example doc comment (ggerganov#4181)

9d5949f

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

readme : update hot topics

d103d93

Fix incorrect format strings and uninitialized variables. (ggerganov#…

55978ce

…4133) * Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement

readme : use PATH for Windows ROCm (ggerganov#4195)

b35f3d0

* Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md

main.swift : fix eos checking (ggerganov#4197)

2568a4b

llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.

convert : fix tensors using grad in some models (ggerganov#4173)

189d684

llama : set metal log callback correctly (ggerganov#4204)

e9c13ff

readme : update hot topics

04814e7

Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (g…

3014b54

…gerganov#4189)

llama : grammar reserve space in decode_utf8 (ggerganov#4210)

f837c3a

* reserve space for codepoints * improvement for the appended 0

scripts : Use mmap in torch load (ggerganov#4202)

1ddb52e

* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference

metal : fix yarn (ggerganov#4220)

22da055

get the correct n_orig_ctx in metal

Fix GPT2 not loading due to graph too small

a6eb9b8

explore quiet mode

bffa781

trigger quiet mode when selecting remotetunnel

2f51a6a

readme : update hot topics

9656026

lookahead : support -n -1 infinite generation

3e73d31

ggml : fix -Warray-bounds warning with gcc (ggerganov#4231)

f3b2698

updated lite

ec1796b

Merge branch 'master' into concedo_experimental

8acd7be

# Conflicts: # Makefile # README.md

reduce max ctx to fit instead of crashing

0e5f16d

kasumi-1 and others added 14 commits November 27, 2023 19:39

readme : add Amica to UI list (ggerganov#4230)

0dab8cd

cmake : fix issue with version info not getting baked into LlamaConfi…

b38a16d

…g.cmake (ggerganov#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option

ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant ful…

8406b09

…l offload checks in llama.cpp (ggerganov#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci

show more info about available APIs

d2ef458

ggml : restore abort() in GGML_ASSERT (ggerganov#4242)

64e64aa

Allocate a small amount of extra context for GGUF to deal with KV fra…

ba5c333

…gmentation causing issues in some scenarios.

Merge branch 'master' into concedo_experimental

581021a

# Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # scripts/build-info.cmake

added a proper quiet mode

b75152e

refined multiuser mode

66ef4a2

readme : add FreeChat (ggerganov#4248)

4fea342

examples : add readme files

1f5cd83

updated docs, shifted kv extra space to be subtracted from user's ctx…

a012342

… value instead of added on load.

Merge branch 'master' into concedo_experimental

e9724cd

# Conflicts: # README.md

fixed chub ai imports (+1 squashed commits)

a195cde

Squashed commits: [cdb7426] fixed chub ai imports

Nexesenex marked this pull request as ready for review December 1, 2023 07:20

Nexesenex merged commit fdeb516 into Nexesenex:concedo_exp_llamaster_up Dec 1, 2023
0 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concedo experimental #18

Concedo experimental #18

Nexesenex commented Dec 1, 2023

Concedo experimental #18

Concedo experimental #18

Conversation

Nexesenex commented Dec 1, 2023