Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swiftui metal update #1

Closed
wants to merge 274 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
274 commits
Select commit Hold shift + click to select a range
569550d
readme : add link to grammars app (#3388)
a10y Sep 29, 2023
0a4a4a0
readme : update hot topics + model links (#3399)
BarfingLemurs Sep 29, 2023
2777a84
llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)
cebtenzzre Sep 29, 2023
bc34dd4
train : fix KQ_pos allocation (#3392)
ggerganov Sep 29, 2023
40e07a6
llama.cpp : add documentation about rope_freq_base and scale values (…
slaren Sep 29, 2023
f5ef5cf
ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)
slaren Sep 30, 2023
c97f01c
infill : add new example + extend server API (#3296)
vvhg1 Oct 2, 2023
ea55295
docker : ignore Git files (#3314)
kevinji Oct 2, 2023
095231d
cmake : fix transient definitions in find pkg (#3411)
bandoti Oct 2, 2023
a847676
metal : set log callback before initializing (#3427)
phronmophobic Oct 2, 2023
a03ce38
finetune : fix #3404 (#3437)
xaedes Oct 2, 2023
9476b01
cmake : make CUDA flags more similar to the Makefile (#3420)
cebtenzzre Oct 2, 2023
0fe3210
gguf : general usability improvements (#3409)
cebtenzzre Oct 2, 2023
29a404a
gguf : add BERT, MPT, and GPT-J arch info (#3408)
cebtenzzre Oct 2, 2023
665018c
CLBlast: Add broadcast support for matrix multiplication (#3402)
shibe2 Oct 2, 2023
e78f0b0
cmake : increase minimum version for add_link_options (#3444)
cebtenzzre Oct 2, 2023
1c84003
convert : fix vocab size when not defined in hparams (#3421)
cebtenzzre Oct 2, 2023
ff5a3f0
Work on the BPE tokenizer (#3252)
goerch Oct 3, 2023
017efe8
cmake : make LLAMA_NATIVE flag actually use the instructions supporte…
netrunnereve Oct 3, 2023
f56e1ba
metal : alibi for arbitrary number of heads (#3426)
li-plus Oct 3, 2023
48be797
llama : expose model's rope_freq_scale in the API (#3418)
grencez Oct 3, 2023
ac2219f
llama : fix session saving/loading (#3400)
ggerganov Oct 3, 2023
8186242
main : consistent prefix/suffix coloring (#3425)
h-h-h-h Oct 3, 2023
79f34ab
ggml : add RISC-V Vector Support for K-Quants and improved the existi…
Tameem-10xE Oct 3, 2023
f72f8f2
finetune : readme fix typo (#3465)
iammerrick Oct 4, 2023
f93af02
sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)
ggerganov Oct 4, 2023
f8c90cd
llm : add Refact model (#3329)
ds5t5 Oct 4, 2023
0d152b3
ggml : fix build after #3329
ggerganov Oct 4, 2023
beabc8c
readme : add project status link
ggerganov Oct 4, 2023
019ba1d
convert : fix Baichuan2 models by using vocab size in config.json (#3…
KerfuffleV2 Oct 4, 2023
0745384
ci : add swift build via xcodebuild (#3482)
jhen0409 Oct 5, 2023
8f3a642
swift : disable ACCELERATE_NEW_LAPACK (#3481)
jhen0409 Oct 5, 2023
e8b8d32
server : fix incorrect num_tokens_predicted (#3480)
jhen0409 Oct 5, 2023
e2583cb
CLBlast: Fix handling of on-device tensor data
shibe2 Oct 5, 2023
acec9ea
common : process escape sequences in reverse prompts (#3461)
staviq Oct 5, 2023
45eba93
build : use std::make_tuple() for compatibility with older GCC versio…
kenvix Oct 5, 2023
48edda3
convert : update Falcon script for new HF config (#3448)
cebtenzzre Oct 5, 2023
5e97a60
Merge branch 'master' into swiftui_metal
bachittle Oct 5, 2023
ae6beb4
initial conversion to new format, utf8 errors?
bachittle Oct 6, 2023
090383b
bug fixes, but now has an invalid memory access :(
bachittle Oct 6, 2023
04b2f43
ci : fix xcodebuild destinations (#3491)
jhen0409 Oct 6, 2023
16820a5
llama : correct hparams comparison (#3446)
l3utterfly Oct 6, 2023
97af49f
server : reuse llama_sample_token common util (#3494)
jhen0409 Oct 6, 2023
a8777ad
parallel : add option to load external prompt file (#3416)
pudepiedj Oct 6, 2023
0c731ca
prompts : fix editorconfig checks after #3416
ggerganov Oct 6, 2023
9ca79d5
kv cache slot search improvements (#3493)
KerfuffleV2 Oct 6, 2023
cb13d73
server : docs fix default values and add n_probs (#3506)
Mihaiii Oct 6, 2023
1faaae8
readme : update models, cuda + ppl instructions (#3510)
BarfingLemurs Oct 6, 2023
3a716b4
Fix for #3454 (#3455)
goerch Oct 7, 2023
0e797c2
llm : support Adept Persimmon 8B (#3410)
phillip-kravtsov Oct 7, 2023
c26765a
metal : support default.metallib load & reuse code for swift package …
jhen0409 Oct 7, 2023
f1782c6
quantize : fail fast on write errors (#3521)
cebtenzzre Oct 7, 2023
c47066d
py : change version of numpy requirement to 1.24.4 (#3515)
lyjia Oct 7, 2023
4d03833
gguf.py : fix CI for publishing GGUF package (#3532)
monatis Oct 7, 2023
a16e89c
Fix trying to strip newline from empty prompt and cfg prompt file con…
KerfuffleV2 Oct 7, 2023
63d3b06
llama : fix missing break in Persimmon arch case statements (#3535)
KerfuffleV2 Oct 8, 2023
b0ec521
metal : support MTLGPUFamily < Apple7, formatting, style (#3524)
ggerganov Oct 8, 2023
7d8b249
zig : fix build by introducing train.cpp (#3539)
robertluo Oct 8, 2023
94e502d
ci : enable on obj-c changes + fix metal build (#3540)
ggerganov Oct 8, 2023
a1202a3
k-quants : fix comments about block sizing (#3499)
jrudolph Oct 8, 2023
9c38d18
api_like_OAI.py : simplify function (#2796)
arcrank Oct 8, 2023
8e6716a
api_like_OAI.py : compat with Microsoft Guidance (#2746)
ryderwishart Oct 8, 2023
eee42c6
ci : add Zig CI/CD and fix build (#2996)
kassane Oct 8, 2023
db3abcc
sync : ggml (ggml-backend) (#3548)
ggerganov Oct 8, 2023
dcc09d2
metal : do not use mul_mm kernels when ne00 < 64 (#3542)
ggerganov Oct 9, 2023
fcca0a7
refact : fix convert script + zero out KV cache to avoid nans (#3523)
ggerganov Oct 9, 2023
95bd60a
ggml-alloc : fix assert in debug builds (#3555)
slaren Oct 9, 2023
11ea5c7
infill. : fix tokenization (#3508)
vvhg1 Oct 10, 2023
f5f9121
llm : add MPT support (#3417)
jploski Oct 10, 2023
0aa6595
swift : improvements and fixes (#3564)
jhen0409 Oct 10, 2023
02d2875
llm : add bloom models (#3553)
xingchensong Oct 10, 2023
c5b4936
readme : add bloom (#3570)
xingchensong Oct 10, 2023
233fc1c
Minor improvements in GPT2 tokenizer (#3567)
goerch Oct 10, 2023
9f6ede1
Add MPT model to supported models in README.md (#3574)
Galunid Oct 10, 2023
24ba3d8
examples : add batched.swift + improve CI for swift (#3562)
zshannon Oct 11, 2023
8c70a5f
batched : add bench tool (#3545)
ggerganov Oct 11, 2023
70c29da
common : fix mirostat state when using multiple sequences (#3543)
KerfuffleV2 Oct 11, 2023
a8bdd65
server : add parameter -tb N, --threads-batch N (#3584)
m18coppola Oct 11, 2023
b8fe4b5
main : fix session loading bug (#3400)
ggerganov Oct 11, 2023
57dd55e
server : fix kv cache management (#3588)
ggerganov Oct 12, 2023
6b3ae4d
prompts : add mnemonics.txt
ggerganov Oct 12, 2023
b016596
server : add completion mode (no chat) (#3582)
akx Oct 12, 2023
1a8c879
ci : check if there is enough VRAM (#3596)
ggerganov Oct 12, 2023
f3040be
typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)
ianscrivener Oct 12, 2023
d28e572
cmake : fix add_compile_options on macOS
ggerganov Oct 12, 2023
9e24cc6
docs : fix typo GOMP_CPU_AFFINITY (#3597)
maekawatoshiki Oct 12, 2023
370359e
examples: support LLaVA v1.5 (multimodal model) (#3436)
monatis Oct 12, 2023
1e0e873
CLBlast: Fix matrix-vector multiplication (#3544)
shibe2 Oct 12, 2023
424b638
ggml : add context enumeration functions (#3605)
slaren Oct 13, 2023
2a4bcba
llama : remove n_threads from llama_decode_internal (#3614)
danbev Oct 13, 2023
11dc109
Honor -ngl option for Cuda offloading in llava (#3621)
monatis Oct 14, 2023
11bff29
MPT : support GQA for replit-code-v1.5 (#3627)
cebtenzzre Oct 15, 2023
940efa9
llava : fix tokenization to not add bos between image embeddings and …
ggerganov Oct 16, 2023
281ef73
k-quants : fix quantization ranges (#3646)
ggerganov Oct 17, 2023
1a15955
tokenizer : special token handling (#3538)
staviq Oct 17, 2023
5fe268a
readme : add Aquila2 links (#3610)
ftgreat Oct 17, 2023
1142013
save-load-state : fix example + add ci test (#3655)
ggerganov Oct 17, 2023
3ad1e3f
server : documentation of JSON return value of /completion endpoint (…
coezbek Oct 17, 2023
e74c705
editorconfig : remove trailing spaces
ggerganov Oct 17, 2023
a5e8c1d
train-text-from-scratch : fix assert failure in ggml-alloc (#3618)
slaren Oct 17, 2023
40e5ce0
CLBlast: Fix temporary buffer size for f16 conversion (wsize)
shibe2 Oct 11, 2023
8402566
readme : update hot-topics & models, detail windows release in usage …
BarfingLemurs Oct 17, 2023
e1675d1
llama : avoid fprintf in favor of LLAMA_LOG (#3538)
ggerganov Oct 17, 2023
cb33f43
fix embeddings when using CUDA (#3657)
slaren Oct 17, 2023
1117d06
opencl : fix element-wise multiplication (#3656)
shibe2 Oct 18, 2023
c67fe68
metal : implement q5_0 and q5_1 kernels (#3648)
jhen0409 Oct 18, 2023
0e89203
speculative : add tree-based sampling example (#3624)
ggerganov Oct 18, 2023
4e82b2e
speculative : bug fixes
ggerganov Oct 18, 2023
004797f
readme : update hot topics
ggerganov Oct 18, 2023
60abea9
llava : avoid segfault in case of non-existent mmproj file (#3674)
monatis Oct 19, 2023
f3b25e4
multimodal : add BakLLaVA conversion support (#3682)
monatis Oct 19, 2023
e78f3ef
convert : restore compat with old Falcon models (#3680)
cebtenzzre Oct 20, 2023
f439e50
ggml : fix rope + llama minor optimizations (#3560)
GermanAizek Oct 20, 2023
a0edf73
server : fix uninitialized sampling context (close #3685)
ggerganov Oct 20, 2023
8cf19d6
gguf : support big endian platform (#3552)
chenqiny Oct 20, 2023
d1031cf
sampling : refactor init to use llama_sampling_params (#3696)
ggerganov Oct 20, 2023
465219b
CLBlast: Add outer loops over src0 for broadcasting in mulmat
shibe2 Oct 12, 2023
22c69a2
batched : add len CLI argument
ggerganov Oct 22, 2023
d3956ae
main : escape prompt for cfg_negative_prompt and consecutive inputs i…
vvhg1 Oct 22, 2023
a5e7dbd
llama : validate special token ids are in range when loading GGUF mod…
KerfuffleV2 Oct 22, 2023
5a42a5f
readme : remove unsupported node.js library (#3703)
ianscrivener Oct 22, 2023
9e70cc0
Add test for MPT tokenization (#3728)
goerch Oct 22, 2023
438c2ca
server : parallel decoding and multimodal (#3677)
ggerganov Oct 22, 2023
96981f3
make : add optional CUDA_NATIVE_ARCH (#2482)
awhill19 Oct 22, 2023
6336701
Fix baichuan convert script not detecing model (#3739)
Galunid Oct 23, 2023
5be6c80
llama : remove token functions with `context` args in favor of `model…
MarcusDunn Oct 23, 2023
69a6735
Update special token handling in conversion scripts for gpt2 derived …
Galunid Oct 23, 2023
9d02956
issues : separate bug and enhancement template + no default title (#3…
monatis Oct 23, 2023
e393259
Revert "make : add optional CUDA_NATIVE_ARCH (#2482)"
ggerganov Oct 23, 2023
469c9ad
metal : handle ggml_scale for n%4 != 0 (close #3754)
ggerganov Oct 24, 2023
daab3d7
Add more tokenizer tests (#3742)
Galunid Oct 24, 2023
2b4ea35
cuda : add batched cuBLAS GEMM for faster attention (#3749)
ggerganov Oct 24, 2023
abd21fc
cmake : add missed dependencies (#3763)
kingsidelee Oct 24, 2023
b2f7e04
sync : ggml (conv ops + cuda MSVC fixes) (#3765)
ggerganov Oct 24, 2023
1717521
server : do not block system prompt update (#3767)
ggerganov Oct 24, 2023
ad93962
server : add parameter -tb N, --threads-batch N (#3584) (#3768)
cebtenzzre Oct 24, 2023
cc44877
log : disable pid in log filenames
ggerganov Oct 25, 2023
6961c4b
batched-bench : print params at start
ggerganov Oct 25, 2023
34b2a5e
server : do not release slot on image input (#3798)
ggerganov Oct 26, 2023
2f9ec7e
cuda : improve text-generation and batched decoding performance (#3776)
ggerganov Oct 27, 2023
c8d6a1f
simple : fix batch handling (#3803)
tterrasson Oct 27, 2023
6d459cb
llama : correctly report GGUFv3 format (#3818)
cebtenzzre Oct 27, 2023
41aee4d
speculative : ensure draft and target model vocab matches (#3812)
KerfuffleV2 Oct 27, 2023
fdee152
starcoder : add GPU offloading (#3827)
ggerganov Oct 28, 2023
1774611
common : print that one line of the syntax help *also* to standard ou…
HenkPoley Oct 28, 2023
ee1a0ec
llama : add option for greedy sampling with probs (#3813)
ggerganov Oct 28, 2023
bd6d9e2
llama : allow quantizing k-quants to fall back when tensor size incom…
KerfuffleV2 Oct 28, 2023
8a2f2fe
convert : ignore tokens if their IDs are within [0, vocab_size) (#3831)
ggerganov Oct 28, 2023
ba231e8
issues : change label from bug to bug-unconfirmed (#3748)
ggerganov Oct 28, 2023
82a6646
metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793)
akx Oct 28, 2023
ff3bad8
flake : update flake.lock for newer transformers version + provide ex…
Green-Sky Oct 28, 2023
d69d777
ggml : quantization refactoring (#3833)
ggerganov Oct 29, 2023
71a09da
llama : fix kv shift bug (#3835)
ggerganov Oct 29, 2023
2046eb4
make : remove unnecessary dependency on build-info.h (#3842)
cebtenzzre Oct 29, 2023
6e08281
Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)
KerfuffleV2 Oct 29, 2023
207b519
ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861)
ggerganov Oct 30, 2023
07178c9
flake.nix: fix for rocm 5.7 (#3853)
Tungsten842 Oct 31, 2023
238657d
samplers : Min-P sampler implementation [alternative to Top P/Top K] …
kalomaze Oct 31, 2023
71e3718
llama : refactor graph build code (#3837)
ggerganov Nov 1, 2023
ca190bc
server : re-enable completion and embedded at the same time (#3876)
a-h Nov 1, 2023
f0e2093
scripts : add server-llm.sh (#3868)
ggerganov Nov 1, 2023
73bdcb3
finetune : add -ngl parameter (#3762)
AndrewGodfrey Nov 1, 2023
9a3b4f6
ggml : fix UNUSED macro (#3762)
ggerganov Nov 1, 2023
e75dfdd
sampling : null grammar field after reset (#3885)
l3utterfly Nov 1, 2023
a2758d0
log : make generating separate log files optional (#3787)
staviq Nov 1, 2023
0e40806
common : allow caller to handle help/argument exceptions (#3715)
bandoti Nov 1, 2023
5033796
llm : add llm_build_context (#3881)
ggerganov Nov 1, 2023
ff8f9a8
common : minor (#3715)
ggerganov Nov 1, 2023
e16b9fa
metal : multi-simd softmax (#3710)
ggerganov Nov 1, 2023
523e49b
llm : fix falcon norm after refactoring (#3837)
ggerganov Nov 1, 2023
c43c2da
llm : fix llm_build_kqv taking unused tensor (benign, #3837)
ggerganov Nov 1, 2023
898aeca
llama : implement YaRN RoPE scaling (#2268)
cebtenzzre Nov 1, 2023
d02e98c
ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
slaren Nov 1, 2023
0eb332a
llama : fix llama_context_default_params after #2268 (#3893)
cebtenzzre Nov 1, 2023
2fffa0d
cuda : fix RoPE after #2268 (#3897)
cebtenzzre Nov 2, 2023
183b3fa
metal : fix build errors and kernel sig after #2268 (#3898)
ggerganov Nov 2, 2023
4d719a6
cuda : check if this fixes Pascal card regression (#3882)
ggerganov Nov 2, 2023
b12fa0d
build : link against build info instead of compiling against it (#3879)
cebtenzzre Nov 2, 2023
1efae9b
llm : prevent from 1-D tensors being GPU split (#3697)
ggerganov Nov 2, 2023
2756c4f
gguf : remove special-case code for GGUFv1 (#3901)
ggerganov Nov 2, 2023
21958bb
cmake : disable LLAMA_NATIVE by default (#3906)
slaren Nov 2, 2023
4ff1046
gguf : print error for GGUFv1 files (#3908)
ggerganov Nov 2, 2023
d606905
cuda : use CUDA memory pool with async memory allocation/deallocation…
young-developer Nov 2, 2023
c7743fe
cuda : fix const ptrs warning causing ROCm build issues (#3913)
ggerganov Nov 2, 2023
224e7d5
readme : add notice about #3912
ggerganov Nov 2, 2023
51b2fc1
cmake : fix relative path to git submodule index (#3915)
abetlen Nov 2, 2023
629f917
cuda : add ROCM aliases for CUDA pool stuff (#3918)
KerfuffleV2 Nov 2, 2023
3fdbe6b
llama : change yarn_ext_factor placeholder to -1 (#3922)
cebtenzzre Nov 3, 2023
0581602
common : YAYF (yet another YARN fix) (#3925)
ggerganov Nov 3, 2023
8f961ab
speculative : change default p_accept to 0.5 + CLI args (#3919)
ggerganov Nov 3, 2023
abb77e7
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)
slaren Nov 3, 2023
5ba3746
ggml-metal: fix yarn rope (#3937)
jxy Nov 3, 2023
d9b33fe
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion…
psugihara Nov 3, 2023
f28af0d
gguf-py: Support 01.AI Yi models (#3943)
KerfuffleV2 Nov 4, 2023
48ade94
cuda : revert CUDA pool stuff (#3944)
slaren Nov 5, 2023
a7fac01
ci : use intel sde when ci cpu doesn't support avx512 (#3949)
netrunnereve Nov 5, 2023
c41ea36
cmake : MSVC instruction detection (fixed up #809) (#3923)
netrunnereve Nov 5, 2023
3d48f42
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
wsxiaoys Nov 5, 2023
132d25b
cuda : fix disabling device with --tensor-split 1,0 (#3951)
cebtenzzre Nov 5, 2023
bb60fd0
server : fix typo for --alias shortcut from -m to -a (#3958)
RoyalHeart Nov 5, 2023
d9ccce2
Allow common process_escapes to handle \x sequences (#3928)
KerfuffleV2 Nov 5, 2023
2833a6f
ggml-cuda : fix f16 mul mat (#3961)
slaren Nov 5, 2023
381efbf
llava : expose as a shared library for downstream projects (#3613)
damian0815 Nov 6, 2023
46876d2
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
wsxiaoys Nov 7, 2023
54b4df8
Use params when loading models in llava-cli (#3976)
tejom Nov 7, 2023
e9c1cec
ggml : fix backward rope after YaRN (#3974)
xaedes Nov 7, 2023
413503d
make : do not add linker flags when compiling static llava lib (#3977)
ggerganov Nov 7, 2023
0a7c980
gguf : track writer state, free unneeded tensors, cleanup (#3871)
cebtenzzre Nov 7, 2023
875fb42
ggml-alloc : fix backend assignments of views (#3982)
slaren Nov 8, 2023
57ad015
server : add min_p param (#3877)
Mihaiii Nov 9, 2023
a75fa57
scripts: Generalize convert scripts (#3838)
Galunid Nov 9, 2023
df9d129
Unbreak persimmon after #3837 (#4010)
Galunid Nov 10, 2023
4a4fd3e
server : allow continue edit on completion mode (#3950)
jhen0409 Nov 10, 2023
34b0a08
gguf-py: Refactor and allow reading/modifying existing GGUF files (#3…
KerfuffleV2 Nov 11, 2023
d96ca7d
server : fix crash when prompt exceeds context size (#3996)
z80maniac Nov 11, 2023
e86fc56
Fix gguf-convert-endian script (#4037)
monatis Nov 11, 2023
532dd74
Fix some documentation typos/grammar mistakes (#4032)
richardkiss Nov 12, 2023
21fd874
gguf-py: gguf_writer: Use bytearray to build metadata (#4051)
KerfuffleV2 Nov 12, 2023
bb50a79
Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4…
KerfuffleV2 Nov 13, 2023
4760e7c
sync : ggml (backend v2) (#3912)
ggerganov Nov 13, 2023
c049b37
readme : update hot topics
ggerganov Nov 13, 2023
3d68f36
ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)
ggerganov Nov 13, 2023
bd90eca
llava : fix regression for square images in #3613 (#4056)
monatis Nov 13, 2023
b46d12f
convert.py: also look for plain model.safetensors (#4043)
afrideva Nov 14, 2023
36eed0c
stablelm : StableLM support (#3586)
Galunid Nov 14, 2023
6bb4908
Fix MacOS Sonoma model quantization (#4052)
TortoiseHam Nov 14, 2023
1cf2850
ggml-cuda : increase max graph size (#4084)
slaren Nov 15, 2023
a6fc554
llama : restore prefix space in llama tokenizer (#4081)
cebtenzzre Nov 15, 2023
8da4627
gguf : fix potential infinite loops while parsing (#4100)
texmex76 Nov 16, 2023
91f6499
Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)
KerfuffleV2 Nov 17, 2023
4f447a4
llama : fix data units (#4101)
ggerganov Nov 17, 2023
b83e149
cuda : get_row_rounding F32 (#4095)
AndrewGodfrey Nov 17, 2023
947f64f
finetune : zero the loraB initial vectors (#4082)
AndrewGodfrey Nov 17, 2023
3e916a0
finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079)
gwjr Nov 17, 2023
e85bb1a
llama : add functions to get the model's metadata (#4013)
slaren Nov 17, 2023
ba4cf5c
train : move number of gpu layers argument parsing to common/train.cp…
jpodivin Nov 17, 2023
f7d5e97
py : remove superfluous import statements (#4076)
jpodivin Nov 17, 2023
c7cce12
llava : fix compilation warning that fread return value is not used (…
huawei-lin Nov 17, 2023
9e87ef6
common : improve yaml log escaping (#4080)
joennlae Nov 17, 2023
11173c9
py : Falcon HF compatibility (#4104)
cmp-nct Nov 17, 2023
2ab0707
convert : use 'model' value if it exists. This allows karpathy/tinyll…
dmahurin Nov 17, 2023
2fa02b4
examples : add tokenize (#4039)
zakkor Nov 17, 2023
5ad387e
tokenize : fix trailing whitespace
ggerganov Nov 17, 2023
8e93610
build : support ppc64le build for make and CMake (#3963)
bufferoverflow Nov 17, 2023
bbecf3f
llama : increase max nodes (#4115)
slaren Nov 17, 2023
cd61854
added O3, now has insufficient memory access
bachittle Nov 18, 2023
f510cc1
Merge branch 'master' into swiftui_metal_update
bachittle Nov 18, 2023
ce31d95
begin sync with master
bachittle Nov 18, 2023
a22264a
update to match latest code, new errors
bachittle Nov 22, 2023
f002a2e
fixed it!
bachittle Nov 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
*.o
*.a
.cache/
.git/
.github/
.gitignore
.vs/
.vscode/
.DS_Store
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
---
name: Issue and enhancement template
about: Used to report issues and request enhancements for llama.cpp
title: "[User] Insert summary of your issue or enhancement.."
labels: ''
name: Bug template
about: Used to report bugs in llama.cpp
labels: ["bug-unconfirmed"]
assignees: ''

---
Expand Down Expand Up @@ -46,7 +45,7 @@ $ g++ --version

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Please help provide information about the failure / bug.

# Steps to Reproduce

Expand Down
28 changes: 28 additions & 0 deletions .github/ISSUE_TEMPLATE/enhancement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: Enhancement template
about: Used to request enhancements for llama.cpp
labels: ["enhancement"]
assignees: ''

---

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [ ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ ] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [ ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [ ] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement.

# Motivation

Please provide a detailed written description of reasons why this feature is necessary and how it is useful to `llama.cpp` users.

# Possible Implementation

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
123 changes: 82 additions & 41 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ on:
push:
branches:
- master
paths: ['.github/workflows/**', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu']
paths: ['.github/workflows/**', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m']
pull_request:
types: [opened, synchronize, reopened]
paths: ['**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu']
paths: ['**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m']

env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
Expand All @@ -38,13 +38,13 @@ jobs:
- name: Build
id: make_build
run: |
CC=gcc-8 make
CC=gcc-8 make -j $(nproc)

- name: Test
id: make_test
run: |
CC=gcc-8 make tests
make test
CC=gcc-8 make tests -j $(nproc)
make test -j $(nproc)

ubuntu-latest-cmake:
runs-on: ubuntu-latest
Expand All @@ -66,7 +66,7 @@ jobs:
mkdir build
cd build
cmake ..
cmake --build . --config Release
cmake --build . --config Release -j $(nproc)

- name: Test
id: cmake_test
Expand Down Expand Up @@ -101,7 +101,7 @@ jobs:
mkdir build
cd build
cmake .. -DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON -DCMAKE_BUILD_TYPE=${{ matrix.build_type }}
cmake --build . --config ${{ matrix.build_type }}
cmake --build . --config ${{ matrix.build_type }} -j $(nproc)

- name: Test
id: cmake_test
Expand Down Expand Up @@ -135,7 +135,7 @@ jobs:
mkdir build
cd build
cmake -DLLAMA_MPI=ON ..
cmake --build . --config Release
cmake --build . --config Release -j $(nproc)

- name: Test
id: cmake_test
Expand All @@ -160,13 +160,13 @@ jobs:
- name: Build
id: make_build
run: |
make
make -j $(sysctl -n hw.logicalcpu)

- name: Test
id: make_test
run: |
make tests
make test
make tests -j $(sysctl -n hw.logicalcpu)
make test -j $(sysctl -n hw.logicalcpu)

macOS-latest-cmake:
runs-on: macos-latest
Expand All @@ -188,8 +188,8 @@ jobs:
sysctl -a
mkdir build
cd build
cmake -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF ..
cmake --build . --config Release
cmake ..
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)

- name: Test
id: cmake_test
Expand Down Expand Up @@ -223,7 +223,7 @@ jobs:
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_DEPLOYMENT_TARGET=14.0
cmake --build . --config Release
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)

macOS-latest-cmake-tvos:
runs-on: macos-latest
Expand Down Expand Up @@ -251,7 +251,35 @@ jobs:
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_SYSTEM_NAME=tvOS \
-DCMAKE_OSX_DEPLOYMENT_TARGET=14.0
cmake --build . --config Release
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)

macOS-latest-swift:
runs-on: macos-latest

strategy:
matrix:
destination: ['generic/platform=macOS', 'generic/platform=iOS', 'generic/platform=tvOS']

steps:
- name: Clone
id: checkout
uses: actions/checkout@v1

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: xcodebuild for swift package
id: xcodebuild
run: |
xcodebuild -scheme llama -destination "${{ matrix.destination }}"

- name: Build Swift Example
id: make_build_swift_example
run: |
make swift

windows-latest-cmake:
runs-on: windows-latest
Expand All @@ -260,22 +288,23 @@ jobs:
OPENBLAS_VERSION: 0.3.23
OPENCL_VERSION: 2023.04.17
CLBLAST_VERSION: 1.6.0
SDE_VERSION: 9.21.1-2023-04-24

strategy:
matrix:
include:
- build: 'noavx'
defines: '-DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DBUILD_SHARED_LIBS=ON'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DBUILD_SHARED_LIBS=ON'
- build: 'avx2'
defines: '-DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
- build: 'avx'
defines: '-DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX2=OFF -DBUILD_SHARED_LIBS=ON'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX2=OFF -DBUILD_SHARED_LIBS=ON'
- build: 'avx512'
defines: '-DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX512=ON -DBUILD_SHARED_LIBS=ON'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX512=ON -DBUILD_SHARED_LIBS=ON'
- build: 'clblast'
defines: '-DLLAMA_BUILD_SERVER=ON -DLLAMA_CLBLAST=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/clblast"'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_CLBLAST=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/clblast"'
- build: 'openblas'
defines: '-DLLAMA_BUILD_SERVER=ON -DLLAMA_BLAS=ON -DBUILD_SHARED_LIBS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_BLAS=ON -DBUILD_SHARED_LIBS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'

steps:
- name: Clone
Expand Down Expand Up @@ -324,7 +353,7 @@ jobs:
mkdir build
cd build
cmake .. ${{ matrix.defines }}
cmake --build . --config Release
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS}

- name: Add clblast.dll
id: add_clblast_dll
Expand Down Expand Up @@ -355,11 +384,23 @@ jobs:

- name: Test
id: cmake_test
if: ${{ matrix.build != 'clblast' && (matrix.build != 'avx512' || env.HAS_AVX512F == '1') }} # Test AVX-512 only when possible
if: ${{ matrix.build != 'clblast' && (matrix.build != 'avx512' || env.HAS_AVX512F == '1') }} # not all machines have native AVX-512
run: |
cd build
ctest -C Release --verbose --timeout 900

- name: Test (Intel SDE)
id: cmake_test_sde
if: ${{ matrix.build == 'avx512' && env.HAS_AVX512F == '0' }} # use Intel SDE for AVX-512 emulation
run: |
curl.exe -o $env:RUNNER_TEMP/sde.tar.xz -L "https://downloadmirror.intel.com/777395/sde-external-${env:SDE_VERSION}-win.tar.xz"
# for some weird reason windows tar doesn't like sde tar.xz
7z x "-o${env:RUNNER_TEMP}" $env:RUNNER_TEMP/sde.tar.xz
7z x "-o${env:RUNNER_TEMP}" $env:RUNNER_TEMP/sde.tar
$sde = $(join-path $env:RUNNER_TEMP sde-external-${env:SDE_VERSION}-win/sde.exe)
cd build
& $sde -future -- ctest -C Release --verbose --timeout 900

- name: Determine tag name
id: tag
shell: bash
Expand Down Expand Up @@ -414,8 +455,8 @@ jobs:
run: |
mkdir build
cd build
cmake .. -DLLAMA_BUILD_SERVER=ON -DLLAMA_CUBLAS=ON -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release
cmake .. -DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_CUBLAS=ON -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS}

- name: Determine tag name
id: tag
Expand Down Expand Up @@ -457,22 +498,22 @@ jobs:
path: |
cudart-llama-bin-win-cu${{ matrix.cuda }}-x64.zip

freeBSD-latest:
runs-on: macos-12
steps:
- name: Clone
uses: actions/checkout@v3

- name: Build
uses: cross-platform-actions/action@v0.19.0
with:
operating_system: freebsd
version: '13.2'
hypervisor: 'qemu'
run: |
sudo pkg update
sudo pkg install -y gmake automake autoconf pkgconf llvm15 clinfo clover opencl clblast openblas
gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15
# freeBSD-latest:
# runs-on: macos-12
# steps:
# - name: Clone
# uses: actions/checkout@v3
#
# - name: Build
# uses: cross-platform-actions/action@v0.19.0
# with:
# operating_system: freebsd
# version: '13.2'
# hypervisor: 'qemu'
# run: |
# sudo pkg update
# sudo pkg install -y gmake automake autoconf pkgconf llvm15 clinfo clover opencl clblast openblas
# gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j `sysctl -n hw.ncpu`

release:
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/gguf-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ jobs:
poetry install

- name: Build package
run: poetry build
run: cd gguf-py && poetry build
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
packages-dir: gguf-py/dist
25 changes: 25 additions & 0 deletions .github/workflows/zig-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Zig CI

on:
pull_request:
push:
branches:
- master

jobs:
build:
strategy:
fail-fast: false
matrix:
runs-on: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@v3
with:
submodules: recursive
fetch-depth: 0
- uses: goto-bus-stop/setup-zig@v2
with:
version: 0.11.0
- name: Build Summary
run: zig build --summary all -freference-trace
16 changes: 14 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@
*.gcno
*.gcda
*.dot
*.bat
*.metallib
.DS_Store
.build/
.cache/
.ccls-cache/
.direnv/
.envrc
.swiftpm
Expand Down Expand Up @@ -40,21 +43,29 @@ models-mnt
/embedding
/gguf
/gguf-llama-simple
/infill
/libllama.so
/llama-bench
/llava-cli
/main
/metal
/perplexity
/q8dot
/quantize
/quantize-stats
/result
/save-load-state
/server
/simple
/batched
/batched-bench
/export-lora
/finetune
/speculative
/parallel
/train-text-from-scratch
/vdot
build-info.h
/common/build-info.cpp
arm_neon.h
compile_commands.json
CMakeSettings.json
Expand Down Expand Up @@ -85,4 +96,5 @@ tests/test-quantize-perf
tests/test-sampling
tests/test-tokenizer-0-llama
tests/test-tokenizer-0-falcon
tests/test-tokenizer-1
tests/test-tokenizer-1-llama
tests/test-tokenizer-1-bpe
Loading
Loading