[pull] master from ggerganov:master #27

pull · 2024-01-22T13:38:01Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S * Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Together with an importance matrix, this brings perplexity for LLaMA-v2-70B below the perplexity of the former Q2_K with a 800 MB smaller quantized model size. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

This commit adds `--sample-start` and `--include-sample-start` to the output from the main function in finetune.cpp. The motivation for this is that even though these are set explicitly by the user via the command line, if one forgets to set them then it is useful to have their values printed out. Otherwise it is possible to go through the whole training process before realizing that the values are not what one expected. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>

* make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

simonJJJ and others added 15 commits January 22, 2024 09:33

llama : add more qwen2 models (#5071)

3466c6e

ci : fix Windows CI by updating Intel SDE version (#5053)

5774493

imatrix : keep intermediate imatrix results (#5077)

15bceec

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

workflows: nix-ci: rebuild on flake.lock updates

f7276f7

workflows: nix-build-aarch64: rate limit

f4dd059

workflows: nix-ci: drop the redundant "paths" filter

fe8b3c0

nix: refactor the cleanSource rules

7251870

nix: add a comment about makeScope

5e97ec9

nix: add a comment on the many nixpkgs-with-cuda instances

28603cd

flake.nix: add a comment about flakes vs nix

b2d80e1

pull bot added the ⤵️ pull label Jan 22, 2024

ikawrakow and others added 2 commits January 22, 2024 16:10

KL-divergence (#5076)

6f9939d

* kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

llama : fix not enough space in buffer with Qwen (#5086)

011e8ec

teleprint-me closed this Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #27

[pull] master from ggerganov:master #27

pull bot commented Jan 22, 2024 •

edited

Loading

[pull] master from ggerganov:master #27

[pull] master from ggerganov:master #27

Conversation

pull bot commented Jan 22, 2024 • edited Loading

pull bot commented Jan 22, 2024 •

edited

Loading