Releases · l3utterfly/llama.cpp

21 Jan 05:58

80d0d6b

b4519 Latest

Latest

common : add -hfd option for the draft model (#11318)

* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-21T05:58:03Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-21T05:58:11Z
llama-b4519-bin-macos-arm64.zip

13 MB 2025-01-21T05:58:18Z
llama-b4519-bin-macos-x64.zip

14 MB 2025-01-21T05:58:19Z
llama-b4519-bin-ubuntu-x64.zip

15.9 MB 2025-01-21T05:58:20Z
llama-b4519-bin-win-avx-x64.zip

9.87 MB 2025-01-21T05:58:21Z
llama-b4519-bin-win-avx2-x64.zip

9.87 MB 2025-01-21T05:58:22Z
llama-b4519-bin-win-avx512-x64.zip

9.89 MB 2025-01-21T05:58:22Z
llama-b4519-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-21T05:58:23Z
llama-b4519-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-21T05:58:27Z
Source code (zip)

2025-01-20T20:29:43Z
Source code (tar.gz)

2025-01-20T20:29:43Z

28 Dec 08:11

github-actions

b4393

d79d8f3

b4393

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

11 Dec 07:41

github-actions

b4302

43041d2

b4302

ggml: load all backends from a user-provided search path (#10699)

* feat: load all backends from a user-provided search path

* fix: Windows search path

* refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path`

* refactor: rename `search_path` to `dir_path`

* fix: change `NULL` to `nullptr`

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* fix: change `NULL` to `nullptr`

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

Assets 22

29 Nov 10:53

github-actions

b4219

266b851

b4219

sycl : Reroute permuted mul_mats through oneMKL (#10408)

This PR fixes the failing MUL_MAT tests for the sycl backend.

Assets 22

27 Nov 12:46

github-actions

b4200

46c69e0

b4200

ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

Assets 22

16 Nov 07:52

github-actions

b4098

772703c

b4098

vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

Assets 21

05 Nov 08:45

github-actions

b4033

a9e8a9a

b4033

ggml : fix arch check in bf16_to_fp32 (#10164)

Assets 22

27 Oct 09:02

github-actions

b3982

cc2983d

b3982

sync : ggml

Assets 22

10 Oct 03:40

github-actions

b3902

c81f3bb

b3902

cmake : do not build common library by default when standalone (#9804)

Assets 22

18 Jan 04:15

l3utterfly

v3.3.0

0787070

Layla v3.3.0

llama.cpp used in the Layla v3.3.0 release

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: l3utterfly/llama.cpp

b4519

b4393

b4302

b4219

b4200

b4098

b4033

b3982

b3902

Layla v3.3.0