[pull] master from ggerganov:master #141

pull · 2024-08-11T18:22:25Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…ronization overhead. (#8943) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <picard12@live.de>

…8956) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

Co-authored-by: Neo Zhang <>

* gguf-py : Numpy dequantization for most types * gguf-py : Numpy dequantization for grid-based i-quants

ggml-ci

* py : fix requirements check '==' -> '~=' * cont : fix the fix * ci : run on all requirements.txt

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724 In order to access the above bug you need to login using one of the emails in https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5 Signed-off-by: David Korczynski <david@adalogics.com>

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680 Signed-off-by: David Korczynski <david@adalogics.com>

* readme: introduce gpustack GPUStack is an open-source GPU cluster manager for running large language models, which uses llama.cpp as the backend. Signed-off-by: thxCode <thxcode0824@gmail.com> * readme: introduce gguf-parser GGUF Parser is a tool to review/check the GGUF file and estimate the memory usage without downloading the whole model. Signed-off-by: thxCode <thxcode0824@gmail.com> --------- Signed-off-by: thxCode <thxcode0824@gmail.com>

mtavenrath and others added 4 commits August 11, 2024 10:09

llama : check all graph nodes when searching for result_embd_pooled (#…

33309f6

…8956) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

update guide (#8909)

a21c6fd

Co-authored-by: Neo Zhang <>

flake.lock: Update (#8979)

8cd1bcf

github-actions bot added documentation Improvements or additions to documentation ggml Vulkan SYCL labels Aug 11, 2024

gguf-py : Numpy dequantization for most types (#8939)

4134999

* gguf-py : Numpy dequantization for most types * gguf-py : Numpy dequantization for grid-based i-quants

github-actions bot added the python label Aug 11, 2024

pull bot added ⤵️ pull and removed documentation Improvements or additions to documentation python ggml Vulkan SYCL labels Aug 11, 2024

server : handle models with missing EOS token (#8997)

5ef07e2

ggml-ci

github-actions bot added documentation Improvements or additions to documentation examples python server ggml Vulkan SYCL labels Aug 12, 2024

py : fix requirements check '==' -> '~=' (#8982)

d3ae0ee

* py : fix requirements check '==' -> '~=' * cont : fix the fix * ci : run on all requirements.txt

github-actions bot added the devops label Aug 12, 2024

Septa2112 and others added 4 commits August 12, 2024 11:46

Fix a spelling mistake (#9001)

2589292

grammar-parser : fix possible null-deref (#9004)

1262e7e

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680 Signed-off-by: David Korczynski <david@adalogics.com>

teleprint-me closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #141

[pull] master from ggerganov:master #141

pull bot commented Aug 11, 2024 •

edited

Loading

[pull] master from ggerganov:master #141

[pull] master from ggerganov:master #141

Conversation

pull bot commented Aug 11, 2024 • edited Loading

pull bot commented Aug 11, 2024 •

edited

Loading