[pull] master from ggerganov:master #132

pull · 2024-07-21T06:34:59Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* convert_hf : fix Gemma v1 conversion * convert_hf : allow renaming tokens, but with a warning * convert_hf : fix Gemma v1 not setting BOS and EOS tokens

* gguf-py : fix some metadata name extraction edge cases * convert_lora : use the lora dir for the model card path * gguf-py : more metadata edge cases fixes Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust. * gguf-py : add more name metadata extraction tests * convert_lora : fix default filename The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None * gguf-py : do not use title case for naming convention Some models use acronyms in lowercase, which can't be title-cased like other words, so it's best to simply use the same case as in the original model name. Note that the size label still has an uppercased suffix to make it distinguishable from the context size of a finetune.

Changes: - Move each example into its own function. This makes the code much easier to read and understand. - Make the program easy to only run one test by commenting out function calls in main(). - Make the output easy to parse by indenting the output for each example. - Add shebang and +x bit to make it clear it's an executable. - Make the host configurable via --host with a default 127.0.0.1:8080. - Make the code look in the tools list to call the registered tool, instead of hardcoding the returned values. This makes the code more copy-pastable. - Add error checking, so that the program exits 1 if the LLM didn't returned expected values. It's super useful to check for correctness. Testing: - Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and Meta-Llama-3-8B-Instruct in F16 and Q5_K_M. - I did not observe a failure even once in Mistral-7B-Instruct-v0.3. - Llama-3 failed about a third of the time in example_concurrent: it only returned one call instead of 3. Even for F16. Potential follow ups: - Do not fix the prompt encoding yet. Surprisingly it mostly works even if the prompt encoding is not model optimized. - Add chained answer and response. Test only change.

When generation ends `completion_loop()` should return a NULL, not the empty string

* models : remove duplicated gpt-2 vocab * models : remove old stablelm vocab * tests : re-enable MPT tokenizer tests * tests : re-enable DeepSeek tokenizer tests * cmake : sort ggml-ci

ggml-ci

* Superflous parens in conditionals were removed. * Unused args in function were removed. * Replaced unused `idx` var with `_` * Initializing file_format and format_version attributes * Renaming constant to capitals * Preventing redefinition of the `f` var Signed-off-by: Jiri Podivin <jpodivin@redhat.com>

* Adding SmolLM Pre Tokenizer * Update convert_hf_to_gguf_update.py Co-authored-by: compilade <git@compilade.net> * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * handle regex * removed .inp and out .out ggufs --------- Co-authored-by: compilade <git@compilade.net>

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

compilade and others added 3 commits July 20, 2024 21:53

convert_hf : fix Gemma v1 conversion (#8597)

c69c630

* convert_hf : fix Gemma v1 conversion * convert_hf : allow renaming tokens, but with a warning * convert_hf : fix Gemma v1 not setting BOS and EOS tokens

github-actions bot added examples python labels Jul 21, 2024

pull bot added ⤵️ pull and removed examples python labels Jul 21, 2024

flake.lock: Update (#8610)

45f2c19

github-actions bot added examples python labels Jul 21, 2024

examples: fix android example cannot be generated continuously (#8621)

b7c11d3

When generation ends `completion_loop()` should return a NULL, not the empty string

github-actions bot added the android label Jul 22, 2024

ggml: fix compile error for RISC-V (#8623)

04bab6b

github-actions bot added the ggml label Jul 22, 2024

server : update doc to clarify n_keep when there is bos token (#8619)

6281544

github-actions bot added the server label Jul 22, 2024

iamlemec and others added 2 commits July 22, 2024 11:06

llama : add Mistral Nemo inference support (#8604)

50e0535

tests : re-enable tokenizer tests (#8611)

e093dd2

* models : remove duplicated gpt-2 vocab * models : remove old stablelm vocab * tests : re-enable MPT tokenizer tests * tests : re-enable DeepSeek tokenizer tests * cmake : sort ggml-ci

github-actions bot added the testing label Jul 22, 2024

ggerganov and others added 4 commits July 22, 2024 13:33

llama : allow overrides for tokenizer flags (#8614)

6f11a83

ggml-ci

llama : fix codeshell support (#8599)

081fe43

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

teleprint-me closed this Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #132

[pull] master from ggerganov:master #132

pull bot commented Jul 21, 2024 •

edited

Loading

[pull] master from ggerganov:master #132

[pull] master from ggerganov:master #132

Conversation

pull bot commented Jul 21, 2024 • edited Loading

pull bot commented Jul 21, 2024 •

edited

Loading