[pull] master from ggerganov:master #135

pull · 2024-07-27T11:07:08Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* cann: fix multi-npu exec error * cann: update comment for ggml_backend_cann_supports_buft

This commit adds a --no-warmup option for llama-cli. The motivation for this is that it can be convenient to skip the warmup llama_decode call when debugging. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* llama : model-based max number of graph nodes ggml-ci * llama : disable 405B max_nodes path due to lack of complaints ggml-ci

* Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>

This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

…ggml/885)

…893) This prevents invalid frees when destroying a partially initialized vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer when running out of device memory. Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>

…ml/895) * Add support for float16 tensors in 1d pooling operations * Add support for float16 input tensors in 2d pooling operations * code cleanup remove unnecessary casting during srow ptr initialization --------- Co-authored-by: vanaka11 <vanaka1189@gmail.com>

Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.

ggml-ci

wangshuai09 and others added 2 commits July 27, 2024 16:36

cann: Fix Multi-NPU execution error (#8710)

bfb4c74

* cann: fix multi-npu exec error * cann: update comment for ggml_backend_cann_supports_buft

common : add --no-warmup option for main/llama-cli (#8712)

9d03d08

This commit adds a --no-warmup option for llama-cli. The motivation for this is that it can be convenient to skip the warmup llama_decode call when debugging. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

github-actions bot added the ggml label Jul 27, 2024

ggerganov and others added 2 commits July 27, 2024 14:59

llama : add function for model-based max number of graph nodes (#8622)

92090ec

* llama : model-based max number of graph nodes ggml-ci * llama : disable 405B max_nodes path due to lack of complaints ggml-ci

github-actions bot added the python label Jul 27, 2024

pull bot added ⤵️ pull and removed python ggml labels Jul 27, 2024

danbev and others added 7 commits July 27, 2024 17:43

ggml : remove unnecessary UNUSED macro call (ggml/880)

c12b6e8

This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (…

d2b851b

…ggml/885)

ggml : loop tiling optimizations for scalar path (ggml/898)

a05ca93

Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.

sync : ggml

ae7985c

ggml-ci

ggml : add missing semicolon (#0)

345c8c0

ggml-ci

github-actions bot added python ggml Vulkan script labels Jul 27, 2024

ggerganov added 2 commits July 27, 2024 18:07

scripts : sync ggml-aarch64 sources

56f20aa

scripts : sync vulkan-shaders (#0)

5e2727f

teleprint-me closed this Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #135

[pull] master from ggerganov:master #135

pull bot commented Jul 27, 2024 •

edited

Loading

[pull] master from ggerganov:master #135

[pull] master from ggerganov:master #135

Conversation

pull bot commented Jul 27, 2024 • edited Loading

pull bot commented Jul 27, 2024 •

edited

Loading